Isolating biological modulators from biodiverse gene fragment libraries

ABSTRACT

The present invention provides a method for identifying a modulator or mediator of a biological activity, which activity includes antigenicity and or immunogenicity, said method comprising the step of:
     (i) producing a gene fragment expression library derived from defined nucleotide sequence fragments; and   (ii) assaying the expression library for at least an amino acid sequence derived from step (i) for a biological activity wherein that activity is different from any activity the amino acid sequence may have in its native environment.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.11/890,714, filed on Aug. 6, 2007, which is a continuation of U.S.patent application Ser. No. 11/198,436, filed on Aug. 4, 2005, which isa continuation of U.S. patent application Ser. No. 09/568,229, filed onMay 5, 2000, which claims the benefit under 35 U.S.C. §119(e) of U.S.Provisional Application No. 60/132,711, filed May 5, 1999, the contentsof which are incorporated herein by reference in their entirety.

FIELD OF THE INVENTION

The present invention relates to the field of screening gene libraries,and more particularly to the generation and screening of natural domainlibraries derived from organisms with known genomic sequences. Methodsfor increasing the diversity of such biodiverse gene fragment librariesfurther by mutagenesis procedures are described. The present inventionalso provides the means by which a wide range of peptide-basedtherapeutics, prophylactics and diagnostic reagents may be developed.

General

Those skilled in the art will appreciate that the invention describedherein is susceptible to variations and modifications other than thosespecifically described. It is to be understood that the inventionincludes all such variation and modifications. The invention alsoincludes all of the steps, features, compositions and compounds referredto or indicated in the specification, individually or collectively, andany and all combinations or any two or more of the steps or features.

The present invention is not to be limited in scope by the specificembodiments described herein, which are intended for the purpose ofexemplification only. Functionally equivalent products, compositions andmethods are clearly within the scope of the invention as describedherein.

Bibliographic details of the publications numerically referred to inthis specification are collected at the end of the description. Allreferences cited, including patents or patent applications are herebyincorporated by reference. No admission is made that any of thereferences constitute prior art.

As used herein the term “derived from” shall be taken to indicate that aspecific integer may be obtained from a particular source albeit notnecessarily directly from that source.

Throughout this specification and the claims which follow, unless thecontext requires otherwise, the word “comprise”, or variations such as“comprises” or “comprising”, will be understood to imply the inclusionof a stated integer or group of integers but not the exclusion of anyother integer or group of integers.

BACKGROUND TO THE INVENTION

Biological interaction/activities, such as protein:protein interactions,antigen:antibody interactions, protein:nucleic interactions,protein:ligand interactions and nucleic acid:nucleic acid interactionsare involved in a wide variety of processes occurring in living cells.For example, agonism and antagonism of receptors by specific ligands,antibody-antigen interactions, including drugs, hormones, secondmessenger molecules, etc. may effect a variety of biological processessuch as gene expression, cellular differentiation and growth, enzymeactivity, metabolite flow and metabolite partitioning between cellularcompartments, amongst others. DNA:protein and RNA:protein interactionsare well known for their effects in regulating gene expression in bothprokaryotic and eukaryotic cells, in addition to being critical for DNAreplication and in the case of certain viruses, RNA replication. Incases where the propagation of cells in deleterious such as thereplication of a pathogen or of a cancer cell, agents which targetbiological interaction/activities or functional structures, are suitablecandidates for therapy. For example, agents that block the function ofmembrane channels or disrupt cytoplasmic membranes by other means, areattractive targets for anti-microbial therapies against pathogens.Further, agents that interact with antigen-specific or non-specificfunctions of the immune systems may provide immunological modulators orvaccines for allergy, autoimmunity, infectious disease, fertility andinvenomation. For example, agents that have the antigenicity ofmicrobial antigens, tumour antigens, allergens or autoantigens may beused for vaccines or immunotherapy.

Undesirable or inappropriate gene expression and/or cellulardifferentiation, cellular growth and metabolism may also beattributable, at least in many cases, to biologicalinteraction/activities involving the binding and/or activity ofproteinaceous molecules, such as transcription factors, peptidehormones, receptor molecules and enzymes, amongst others. In thesecases, therapies can be envisaged which block such inappropriateinteractions and/or which block the formation of inappropriate cellularstructures.

Production of Peptides by Recombinant DNA Techniques

Peptides that can mediate or interfere with a diverse range ofbiological functions include natural peptides and peptides synthesisedto represent a portion or a modified portion of a molecule known tomediate a target function. One source of such peptides are randompeptide libraries constructed with random (or semi-random)oligonucleotides ligated into cloning sites of a plasmid or phagevectors.

Vectors containing DNA encoding different peptides are transfected ortransformed into bacteria or other hosts and cloned by standard plaqueor colony purification procedures. Clones producing peptides with adesired activity can be isolated by a variety of screening or selectionprocedures which are fundamentally the same as the screening proceduresused to detect polypeptides encoded by cDNA or cDNA fragments. Theseinclude the production of peptides as fusions with the coat proteins ofbacteriophage or fusions with bacterial surface proteins so the peptidescan be used as tags for affinity purification procedures; the productionof peptides from hosts infected with phage or transformed with plasmidsto produce arrays of colonies or plaques which can be screened forligand-binding activity or biological activity such as inhibiting thegrowth target bacteria or inducing the activation of genes in targetbacteria; and in positive selection strategies such as two hybridcloning systems, where the peptide produced in the host microorganismbinds to target proteins to form complexes which activate the expressionof the reporter genes cloned into the same host. One of the significantadvantages of phage display technology is that it enables theconstruction of libraries with very large complexities—ie. 10¹⁰ to 10¹¹individual clones.

Likewise, in ‘reverse two hybrid’ or ‘spilt two hybrid’ systems,libraries of appropriately expressed peptides can be screened forblockers of particular protein/protein interactions, which in turnreduces the expression of counter selectable reporter genes encodingtoxic products.

Modification of Peptides for Utility and Optimisation

Once the active peptide or a ligand binding peptide has been identifiedthey can be modified by a variety of procedures to optimise theirutility. Modification may include: alterations in the amino acidresidues which engage the target to improve their binding specificityand affinity; modifications which affect the display of the peptideincluding the valency of binding and constraint of particularconformations; and modifications to attach further functional moietiessuch as markers, toxins and co-activators.

Synthetic peptides can include residues other than the 20 amino acidsfound in nature and/or can be cyclised by means such as oxidation offlanking cystein residues. In the case of peptides mimicking antibodyepitopes, carriers containing the T-cell epitopes required to inducehigh affinity immune responses can be added by genetic techniques.

Examples of Peptides that Modulate Biological Systems

Peptides can be applied as therapeutics or lead molecules for designingtherapeutics for disease including infection, cancer and metabolicdisorders as well as agents for vaccines and immunotherapy,transplantation and diagnostics. The potential usefulness of suchpeptides has been demonstrated by the following examples:

Peptide Antimicrobial Agents

The antimicrobial effect demonstrated by natural peptides produced byfrogs and insects and artificially synthesised by cationic peptides. Alarge variety of antibiotics are peptides or polypeptides. The granulesof mammalian neutrophils produce families of antimicrobial polypeptidesincluding azurocidin, cathepsin G and Cationic Antimicrobial Peptides(CAP57 and CAP37). In addition, neutrophils produce at least twofamilies of antimicrobial peptides, the defensins and the bactenecins.Moreover, many natural antibiotics and antifungal drugs are composed ofpeptides. For example, the magainin family of antimicrobial α-helicalpeptides isolated from the skin of the African clawed toad, XenopusLaevis form lethal pores in the cell membranes of certainmicroorganisms. Similarly, certain α-helical peptides derived from avariety of insect genera have antimicrobial activity. Recently, severalrational design approaches have been used to isolate novel peptideantibiotics. For example, Tiozzo et al., used a “sequence template”approach in which candidate peptide sequences were designed fromalignments of natural antimicrobial peptides [1]. The identification ofvirulence determinants in several pathogens presents other attractivetargets for antimicrobial therapy. For example, Balaban and colleagues(2) have recently identified an autoinducer of virulence inStaphylococcus aureus that controls the production of bacterial toxinsinvolved in pathogenesis. The toxin genes are induced by a regulatoryRNA molecule, RNAIII that is induced by a threshold concentration of anendogenous protein RNAIII Activating Protein (RAP) [2]. Peptideinhibitors of RAP might be expected to act as virulence determinants.Indeed, a natural peptide inhibitor of RAP called RIP (RNAIII inhibitingpeptide) is produced by a non-pathogenic strain of Staphylococcus aureusand appears to inhibit the RNAIII gene and to cause reduced virulence[2].

Peptide Modulators of Growth Regulation

The ability of peptides to affect key modulators of growth regulationhas been demonstrated by Brent and colleagues who used two hybridscreening to identified constrained peptide ‘aptamers’ fromcombinatorial libraries which bind tightly to and inhibit the functionof cyclin dependent kinase 2. This demonstrates the potential fortreatment of neoplasms (3).

Peptides can exhibit exquisite specificity. For example, peptideaptamers have been identified which can discriminate between two closelyrelated allelic variants of the Ras protein (4). Moreover, a peptideaptamer against human cyclin dependent kinase 2 inhibits kinase activityexclusively on certain particular substrates.

Peptide specificity has also been demonstrated in vivo. In a recentreport, expression of aptamers that recognised cyclin dependent kinasesin transgenic flies was shown to cause developmental abnormalities in adominant negative fashion (5). Importantly, the specificity the twoaptamers for particular Cdks (as determined by yeast two hybrid assays)was retained in the Drosophila in vivo assay. Moreover co-expression ofthe specific aptamer target Cdk suppressed the developmental phenotypeobserved (5). This report of successful targeted inhibition of an enzymein vivo with aptamers, firmly establishes as practicable the principlefor developing new therapeutic strategies based on interfering peptides.

Peptide-Based Inhibition: An Emerging Therapeutic Strategy

Much attention has recently focussed on peptides as potentialtherapeutic agents because they can be highly specific and readilysynthesised. Phage display technologies are beginning to prove usefulfor providing peptide leads in drug discovery programs. Efficientdelivery of peptide from outside the cell to the nucleus of eukaryoticcells can now be achieved by attaching sequences such as the targetingmotif “penetratin” which is derived from the Drosophila Antennapaediaprotein. More recently a family of such targeting peptides has beenidentified (6). For example, conjugation of peptide sequences to theVP22 protein has been shown to allow efficient export of the fusionprotein to the nuclei of cells adjacent to primary transfectants (7).Several recent developments make it feasible to physically selectconformationally constrained peptide domains in order to identifypeptides that bind with very high affinity in vivo, favouring highpotency. Mimetic peptides have been reported to inhibit proteininteractions and/or enzyme function. Examples include a nonapeptidederived from the ribonucleotide reductase of herpes simplex virus thatwas linked to an enterotoxin subunit for delivery into cells via itsreceptor. The peptide conjugate was found to inhibit herpes simplex type1 replication in quiescent Vero cells [8]. Using detailed knowledge ofthe PCNA-interaction domain of p21WAF1 derived from two hybrid screens,a peptide has been designed which effectively blocked the interaction.This 20-mer bound with sufficient affinity to block SV40 replication. A20-mer peptide sequence derived from p16 has been found to interact withCdk4 and Cdk6 and inhibited pRB phosphorylation and cell cycleprogression [9]. The authors coupled the specific inhibitor peptide tothe 16 residue penetration peptide for efficient nuclear delivery.Peptides have even been shown to function as inhibitors in animalmodels. For example, a tetrapeptide mimicking the substrate of farnesylprotein transferase has also been shown to block the growth ofRas-dependent tumours in nude mice.

Peptide Mimotopes

Peptides functionally resembling the epitopes (mimotopes) bound byantibodies have been isolated and used as experimental vaccine to induceantibodies which protect against infection as shown for hepatitis B,respiratory syncytial virus, Japanese encephalitis and Streptococcuspneumonia. High affinity antibodies typically bind complex structuresformed by the tertiary conformation of an antigen. The peptide mimotopesessentially convert a conformational epitope made from a completeprotein into a small peptide. It has advantages when only certainepitopes are desired, eg to prevent immunopathology in RSV infection; orin the production of recombinant epitopes where the complete polypeptidemay be difficult to fold; or where the entire antigen has undesirablebiological properties (Staphylococcal toxins in toxic shock syndrome).In the case of carbohydrate antigens, polypeptides that contain themimotope can be constructed to convert a T-cell independent antigen intoa T-dependent antigen for the production of high affinity antibodies andimmunogenicity in young animals including humans. Unlike thecarbohydrates, peptide mimotopes can be produced as DNA vaccines.

The possibility of using mimotopes as antigens for cancer immunotherapyhas been demonstrated for an adenocarcinoma antigen.

Mimotopes can be used as antigens to diagnose infectious disease bydetecting antibody. The possibility has been demonstrated with hepatitisC infection.

Mimotopes representing the antigens recognised by autoantibodies againstβ-islet tissue in diabetes have been demonstrated and it has beenproposed that these could be used to monitor the development of disease(10). Similarly mimotopes have been found for pollen allergens whichcould be used in the diagnosis of allergic disease. In both these casesit is also possible that the mimotopes could be used for therapy bymodulating the immune response or in prophylaxis.

Mimotopes representing transplantation antigens have been demonstratedand thus may be used as tolerogens or blockers to preventtransplantation rejection.

Ligand Interactions or Hormone Receptor Interactions

Peptide mimetics have been used as ligands to affinity purifybiologically useful molecules as shown for the purification of the bloodclotting protein, von Willebrand factor.

The modification of enzyme activity with peptides mimicking substrateshas also been demonstrated. Peptide mimetics can be used as hormones asshown for erythropoietin and can be modified to increase biologicalactivity.

Recombinant Methods for Producing Biologically Active Peptides

The use of fragments from specific genes or cDNA to produce peptidescontaining a biological activity of the polypeptide encoded by the geneor an inhibitor of the activity can sometimes be successful. In otherinstances the activity can be dependent on the conformation of completepolypeptide and cannot be obtained by these techniques. In many casesthe use of random peptide libraries in phage or plasmids to produce apeptide which mimics the biological activity has been successful. Thisinvolves the screening of large numbers of clones producing anessentially random array of peptides for a peptide of the desiredactivity. The activity is sometimes mediated by a peptide which shows anamino acid sequence homology which could explain its biological activitywhile in many cases the peptide acts as a mimetic for the conformationof the polypeptide or its ligand and has no sequence homology. Indeedthe peptide may be a mimetic of a chemically different molecule such asa carbohydrate. It is also possible to use the combinatorial libraryapproach to screen for inhibitors or mediators of complex functionswhere there is no information on the molecular interactions required.

The ability to isolate active peptides from random fragment librariescan however be highly variable and problems with low affinityinteractions have been reported, particularly for peptides required torepresent complex conformations such as discontinuous epitopes bound bymany antibodies. There is unpredictability in that, libraries that are arich source of peptides for one ligand may not contain peptides forothers. While the ability to obtain desired peptides should be increasedwith libraries containing larger random peptides and more randompeptides there are practical difficulties in conducting high throughputscreening or affinity purification particularly since it has been shownthat high-density affinity purification is inefficient. There is alsouncertainty about the degree to which peptides isolated from the randompeptide libraries will retain their binding or biological activity whenproduced as part of different delivery strategies such as fusions withdifferent polypeptides. There is thus an opportunity to supplement orimprove the existing technology with new strategies.

Biodiverse Peptide Domain Libraries from Defined Genomic Sources

Peptides present potential therapeutic and prophylactic agents for manyhuman and animal diseases, biochemical disorders and adverse drugeffects, because they can interact with other molecules with highspecificity and affinity. However, a major problem to be overcome in thefield of peptide therapeutics and prophylactics is the identification ofspecific amino acid sequences having a desired antagonist or agonistactivity against a particular biological activity in a particularcellular environment. Such candidate peptide drugs may be particularlydifficult to identify from truly random peptide libraries that lack anyenrichment for sequences encoding molecular shapes suitable for bindingbiological structures. In contrast, nature has already assembled a richsource of such domains within the myriad of peptides, polypeptides andproteins encoded by the diverse range of genomes that make up thebiosphere.

A wide range of different methods have been put forward to facilitatethe screening of biological libraries (such as cDNA libraries) in anexpedient manner to identify suitable protein or polypeptide molecules.Libraries of thousands and in some cases even millions of polypeptidesor peptides have been prepared be gene expression systems and displayedon chemical supports or in biological systems suitable for testingbiological activity. Generally such libraries are made from eitherindividual genomes of organisms believed to be rich sources of new drugs(such as ‘extremophile’ bacterial species) or from a mixture ofuncharacterised genomes isolated directly from the environment.

While the screening the biodiverse libraries has proven valuable, suchlibraries tend to be biased towards the frequency with which aparticular organism is found in the native environment and may notnecessarily represent the true population of the biodiversity found in aparticular biological sample. Moreover, such screens are normallyintended to isolate genes encoding enzymes, hence attempts are oftenmade to bias such libraries to contain larger inserts which could beexpected to encode biologically active enzymes.

In U.S. Pat. No. 5,763,239 in the name of Short et al., a procedure isdescribed for normalising genomic DNA from an environmental sample, inan attempt to address this problem of bias. Because the librariesmentioned in that patent are generated from environmental samples forwhich little would be known about the genomic constitution of thelibrary the procedure employs complicated normalisation methods tonormalise the genomic constitution of the libraries. While thatprocedure permits some normalisation of the genomes in an environmentalsample, the methods that it describes are complicated, there is a riskthat rare genomic DNA's will be lost when the methods are applied and/orthat new biases will be introduced by the procedure.

In addition to the above, current screening methods often rely on theisolation of genomic nucleic acid sequences using PCR amplificationprocedures for which little may be known about the genomic sequences. Insuch cases biases can be introduced through such factors as the presenceof disproportional representation of repeated sequences in certaingenomes. Furthermore, because no information is known about the genomicconstitution of the environment sample, only limited bioinformatic datacan be derived from a screen of the library. This problem is addressedto some extent in U.S. Pat. No. 5,763,239, which seeks to increase theprobability that a genomic sequence of low copy number in anenvironmental sample will have a chance of being represented in alibrary.

There are, however, currently no available methods for screeningnormalised biodiverse peptide domain libraries in vivo wherein theentire composition and complexity of the library can be accuratelyestimated and wherein the screening process provides such comprehensivebioinformatic data useful for rational drug design. Moreover, no methodshave been described which are specifically designed for the constructionof natural genomic sequence libraries that have been optimised for theexpression of domains per se, rather than entire polypeptides.Accordingly, there is a need to develop techniques that provides for thelarge-scale screening of peptide libraries which are enriched forsequences encoding bioactive domains useful in the determination ofuseful peptide therapeutics, the basis of which is not necessarilyrelated to the natural role of particular peptide domains.

SUMMARY OF THE INVENTION

Proteins of different function show evidence of evolving by shuffling ofdomains (eg. nerve growth factor and the low-density lipoproteinreceptors) or by minor modifications of different residues withinconserved domains (serine proteases). The present invention seeks tomimic this evolution by using peptide libraries encoded by known anddefined nucleotide sequence fragments that are a rich source of peptidescontaining amino acid sequences evolved for diverse molecularinteractions not necessarily closely related to the function performedwithin the donor organism. Also described are means of extending thediversity of biodiverse gene fragment libraries further bymutagenesis—either in vitro using PCR amplification under mutagenicconditions, or in vivo by replication of the library in ‘mutator’bacterial strains which contain mutations in genes involved in mismatchrepair of DNA.

The present invention provides a method for identifying a modulator ormediator of a biological activity, which activity includes antigenicityand or immunogenicity, said method comprising the step of:

-   (i) producing a gene fragment expression library derived from    defined nucleotide sequence fragments; and-   (ii) assaying the expression library for at least an amino acid    sequence derived from step (i) for a biological activity wherein    that activity is different from any activity the amino acid sequence    may have in its native environment.

It will be appreciated that the present invention has broad reachingapplication for identifying amino acid sequences that have a novelactivity compared to that for which they may be recognised as having intheir ordinary natural environment. For example, the present inventionis particularly useful for screening genome fragment expressionlibraries for amino acid sequences reactive with particular antibodiesby for example affinity chromatography of a phage display library.Moreover, the present invention provides a means for defining aminoacids essential for modulating a biological activity such as, forexample, antibody binding. It also provides a means for isolating aminoacid sequence modulators or mediators of a biological activity, whichare capable of functioning independently of the artificial constrains ofthe screening system by which they were identified (e.g. gene fusionsetc.).

In particular the present invention is particularly useful foridentifying novel therapeutics such as vaccines or immunotherapeuticantigens, antibiotics or inhibitory agents that may serve as candidateagonists and antagonists of any biological activity. For example,biodiverse gene fragment libraries may be used to produce antigens thatcan be used for vaccines or for immunotherapy of allergic disease orautoimmune disease. In the case of the allergen immunotherapy it isespecially desirable to obtain a high affinity peptide (which is rarefrom random peptide libraries) because it may be used as a monovalentantigen to avoid crosslinking of IgE on mast cells.

This system may also be used in high through-put screening for agentswhich target specific protein:DNA, peptide:DNA or peptide:protein;protein:protein interactions or a structure such as the cell wall or amembrane transport component.

A distinct advantage of the technology described herein is that throughhaving greater control over the composition of an amino acid sequenceexpression library by knowing its defined constitution, one canintentionally maximise the phylogenetic distance between the constituentgenomes of the library to ensure a maximal degree of diversity which,could in principle rival the sequence diversity of environmentallyderived genome samples, notwithstanding the fact that such samples maycontain more species diversity per se. This approach will becomeincreasingly powerful as the range of available nucleotide sequencesincrease further.

In one embodiment there is provided a method for identifying a modulatoror mediator of a biological activity, which activity includesantigenicity and or immunogenicity, said method comprising the steps of:

-   (i) producing a gene fragment expression library derived from    defined nucleotide sequence fragments, which nucleotide sequence    encodes at least a sequence of amino acids;-   (ii) assaying the expression library for at least an amino acid    sequence derived from step (i) for a biological activity wherein the    library is adapted to display a range of amino acid sequences each    of which may vary by at least an amino acid; and-   (iii) identifying those amino acids essential for modulating the    biological activity, which activity is different from the activity    which the sequence is not normally associated in its native    environment.

A sequence of amino acids that is particularly effective in modulatingor mediating a biological activity (e.g. antigenicity or immunogenicity)can be selected by comparing the observed activity from a series ofdifferent amino acid sequences of a similar constitution. Usingdifferences in the observed activity it is possible to identify thoseamino acids essential for the activity and those which are eitherdesired for the activity or in the alternate case those which are ahindrance to achieving effective activity.

In a second embodiment the method may be employed to identify novelantibacterial peptides that are conditionally released from a fusionprotein. According to this embodiment, there is provided a method ofidentifying an antibacterial peptide, comprising:

-   (i) transforming or transfecting a first bacterial population of    cells with a peptide expression library derived from defined    nucleotide sequence fragments;-   (ii) growing said first bacterial population for a time and under    conditions sufficient for expression of the amino acid sequences    encoded within said library to occur and for release of the amino    acid sequences from their cognate fusions;-   (iii) contacting the expressed amino acid sequences with pathogenic    bacteria;-   (iv) identifying those sequence(s) that are capable of inhibiting    the growth of the pathogenic bacteria, or killing the pathogenic    bacteria; and-   (v) selecting those sequences from the identification step in    step (iv) that are not associated with the inhibition of growth of    the pathogenic bacteria, or killing the pathogenic bacteria in their    native environment.

In a third embodiment, there is provided a method for identifying amodifier of a biological activity associated with a host cell, saidmethod comprising the steps of:

-   (i) Expressing a reporter molecule operably under the control of the    biological activity in the cell, wherein at least a molecule    associated with the biological activity comprises an amino acid    sequence encoded by a nucleotide sequence that is placed operably in    connection with a promoter;-   (ii) Incubating at least a cell from step (i) in the presence of an    amino acid sequence(s) from a gene fragment expression library    derived from a defined genomic sequence, under conditions promoting    interaction between the amino acid sequence(s) and a nucleotide or    amino acid sequence involved with the biological activity; and-   (iii) Identifying at least an amino acid sequence that in the    presence of the cells is capable of modifying expression of said    reporter molecule, or the biological activity; and-   (vi) Selecting those sequences in step (iii) that are not generally    recognised as being able to modifying expression of said reporter    molecule, or the biological activity in their native environment.

Preferably the method described in this embodiment is repeated as oftenas is necessary to ensure that a substantially all of the amino acidsencoded by the defined nucleotide sequence are presented to thebiological activity.

In a fourth embodiment there is provided a method of identifying anantagonist of a biological activity, said method comprising the stepsof:

-   (i) placing expression of a reporter molecule operably under the    control of a biological activity in a cell, wherein at least one    partner of said biological activity comprises an amino acid sequence    encoded by a nucleotide sequence that is placed operably in    connection with a bacterial-expressible promoter in a suitable    vector, wherein (a) the nucleotide sequence is derived from a    nucleotide sequence of known and sequenced origin and (b) the    biological activity is different from any activity that the amino    acid sequence may have in its native environment;-   (ii) incubating the cell in the presence of a candidate compound to    be tested for the ability to antagonise the biological activity; and-   (iii) selecting cells wherein expression of said reporter molecule,    or biological activity, is modified.

Any nucleotide sequence of known nucleotide composition may be used inthe present invention. Preferably the nucleotide sequence is derivedfrom a substantially sequenced genome of a microorganism and/or acompact eukaryotic species (ie a species with a high proportion ofsequence encoding polypeptide). Most preferably, the nucleotide sequenceis derived from a fully sequenced genome from a microorganism and/or acompact genome of a eukaryotic species that is a genome containing ahigh percentage of DNA encoding polypeptides.

Desirably, the present invention employs a peptide expression librarymade from defined genomic sequence present either in isolation or incombination with other defined genomic sequence to identify amino acidsequence(s) that may be suitable candidates for rational drug designwhile at substantially the same time providing comprehensivebioinformatic data about those candidates. The bioinformatic dataderived from the method may be used to identify those amino acidsimportant in modulating the biological activity.

In a fifth embodiment there is provided a method for identifying amodulator of a biological activity, said method comprising the steps of:

-   (i) producing an amino acid expression library derived from a    defined genomic sequence;-   (ii) contacting an amino acid sequence derived from the expression    library with a reporter molecule that is operably under the control    of a biological activity associated with a host; and-   (iii) identifying an amino acid sequence capable of modulating the    biological activity wherein that activity is different from any    activity the amino acid sequence may have in its native environment.

In a sixth embodiment, there is provided a method for identifying anamino acid sequence that is capable of modulating a biological activityin a host cell, said method comprising the steps of:

-   (i) producing a library in a host wherein (a) the transformed cells    of said library contain at least a first nucleotide sequence that    comprises or encodes a reporter molecule the expression of which is    operably under control of said biological activity and a second    nucleotide sequence derived from a known genomic sequence that is    capable of encoding the amino acid sequence when placed operably    under the control of a suitable promoter sequence and wherein (b)    substantially all of the known genomic sequence is present within    the population of transformed cells making up said library and the    biological activity is different from any activity the amino acid    sequence may have in its native environment;-   (ii) culturing said cellular host for a time and/or under conditions    sufficient for expression of said second nucleotide sequence to    occur; and-   (iii) selecting or screening for cells wherein expression of said    reporter molecule is modified.

Preferably, the method defined by the sixth embodiment also includes theadditional steps of:

-   (iv) comparing the range of amino acid sequences that can be derived    from the known genomic sequence against those sequences exhibited    biological activity; and-   (v) determining those amino acids which are essential for modifying    the reporter molecule activity.

In a particularly preferred form of the invention, a plurality ofdefined genomic sequences derived from different organisms may beexpressed in the gene fragment expression library. Where genomicsequences from more than one organism are used in the method each of thesequences are preferably provided in equal molar amounts to ensure thatan equal proportion of the sequences are included in the method.

The complexity of the gene fragment expression library may also beaugmented by subjecting the defined genomic sequence(s) derived fromthose sequences to methods that mis-read or mutate the sequence(s).Alternatively, or in addition, the complexity of the library may also beaugmented by expressing the defined genomic sequence in each of itsdifferent reading frames. It may also be expressed in its reversereading frames. Thus, allowing for expression of a gene sequence in eachpossible reading frame, for any particular sequence there will be sixdifferent possible combinations.

The present invention also contemplates amino acid sequences identifiedby the method of the present invention as well as the use of thosemolecules in a pharmaceutical composition. The pharmaceuticalcomposition comprising an amino acid sequence capable of modulating ormediating a biological activity or the function of a biological moleculeand a pharmaceutical acceptable carrier and/or diluent.

The present invention also provides a vector (or pool of up to 3vectors) capable of expressing a nucleotide sequence in each of itspossible reading frames and wherein each of the amino acid sequences soproduced are expressed as a fusion with a second amino acid sequence inwhich they are conformationally constrained, wherein said vector atleast comprises:

-   -   (i) a first expression cassette, comprising:        -   (a) a multiple cloning site for insertion of nucleotide            sequence encoding said amino acid sequence, wherein said            multiple cloning site may be adjacent to one or more second            nucleotide sequences encoding a polypeptide loop such that a            fusion polypeptide is capable of being produced between said            first and second amino acid sequences;        -   (b) a terminator sequence adjacent to the multiple cloning            site and distal to said promoter sequence and second            nucleotide sequences;    -   (ii) a means for expressing the first nucleotide sequence in        each of its reading frames;    -   (iii) a bacterial origin of replication and/or a bacteriophage        origin of replication; and    -   (iv) a second expression cassette encoding a bacterial selection        marker gene.

Another aspect of the present invention provides for modification of thetarget microorganism whose growth or alternate function may beinhibited. This microorganism may be modified for screening purposes ina manner that facilitated screening such as by:

-   -   (i) The introduction of novel antibiotic resistance markers by        homologous recombination, by transformation of plasmids or by        random mutagenesis and selection;    -   (ii) The introduction (by homologous recombination or plasmid        transformation) of one or more reporter gene/s (eg. luciferase        or β-galactosidase) under the control of an endogenous promoter        associated with pathology or virulence. For example, the        promoters for the RNAIII or RAP genes of Staphylococcus aureus        could be used to control expression of a reporter gene that        could be easily detected. Such methods are well known to those        skilled in the art—see international (PCT) patent WO 90/40979,        for example.

The present invention also provides a means of exploiting bioinformaticdata concerning homologous sequences encoding structural domains insequenced genomes, to design defined libraries by such techniques asdegenerate PCR techniques or chemical DNA synthesis that focus on aparticular affinity domain. The diversity of such a library may befurther increased by mutagenesis techniques known to those skilled onthe art.

The present invention also provides a high through-put screeningtechnique for the identification of clones (from the library) thatproduce amino acid sequences capable of inhibiting growth or repressingvirulence genes of the pathogenic target organism.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The screening methods described herein differ from existing rationaldesign approaches that attempt to model candidate therapeutic peptidesbased in homologies in the databases to natural inhibitory peptides. Theexisting approaches focus in amino acid sequences that have previouslybeen identified from their natural source due to their inhibitoryproperties. In contrast, the methods described herein, empiricallydetermine amino acid sequences, that may modulate a biological activity,from a wide array of candidates encoded in a genomic expression libraryderived from nucleotide sequences which have been completely determinedwithout regard for their original function of those sequences in nature.

Natural biologically interactive peptide and polypeptide domains arethought to have evolved by selection from a bank of available domains ineach organism in which they arose. Within any organism there is atremendous amount of diverse coding information. To harness thisdiversity a genetic screen has been devised which maximises thediversity of a pool of potential biologically interactive domains.Moreover, since the information used in the screen is derived fromsequenced genetic information, structural information that has alreadyevolved in nature may be exploited by comparing biologically interactivemolecules against similar sequences from a sequenced and test nucleotidesequence. This information desirably permits the identification ofparticular amino acids that are essential to the binding action of thebiological activity and/or possibly particular motifs that are essentialto or at least implicated in the binding reaction. Thus, the presentinvention provides screening methods for identifying potential aminoacid sequence(s) that are capable of modulating or mediating biologicalactivities involving peptides, oligopeptides, proteins and or nucleicacid sequences.

Therefore, the present invention resides in a method for identifying amodulator or mediator of a biological activity, which activity includesantigenicity and or immunogenicity, said method comprising the step of:

-   (i) producing a gene fragment expression library derived from    defined nucleotide sequence fragments; and-   (ii) assaying the expression library for at least an amino acid    sequence derived from step (i) for a biological activity wherein    that activity is different from any activity the amino acid sequence    may have in its native environment.

As used herein, the term “biological activity” shall be taken to includebiological interactions leading to a physical association between two ormore molecules or “partners”. Such activity should be interpreted in itsbroadest context and include, for example, interactions such aspeptide:peptide, peptide:protein, protein:protein, antigen:antibody,peptide:nucleic acid sequence, protein:nucleic acid sequence,peptide:ligand and protein:ligand. For example, the activity includesbut is not limited to any interaction that modulates or mediatesantibody binding or antigen binding or any other amino acid sequencebased interaction described in the background section of thisspecification. Preferably, the physical association involves a cellularprocess or alternatively, is required for a cellular event to occur andwherein that activity is different from any activity the amino acidsequence may have in its native environment. In addition, it shallinclude activity that leads to the disruption of a biological structureand/or activity. The “physical association” may involve the formation ofan induced magnetic field or paramagnetic field, covalent bond formationsuch as a disulfide bridge formation between polypeptide molecules, anionic interaction such as occur in an ionic lattice, a hydrogen bond oralternatively, a van der Waals interaction such as a dipole-dipoleinteraction, dipole-induced-dipole interaction,induced-dipole-induced-dipole interaction or a repulsive interaction orany combination of the above forces of attraction.

Fragments from any nucleotide sequence of a known nucleotide compositionmay be used in the present invention. Those skilled in the art will beaware of a variety of methods for producing nucleotide sequencefragments including: mechanical shearing (eg by sonication), Digestionwith a nuclease (eg by Dnase1), digestion with restriction enzyme/s,polymerase chain reaction using degenerate primers. Preferably thenucleotide sequence is derived from a substantially sequenced genome ofa microorganism and/or a compact eukaryotic species. More preferably,the nucleotide sequence is derived from a fully sequenced genome from amicroorganism and/or a compact eukaryotic species. Most preferably aplurality of different nucleotide sequences are expressed in the genefragment expression libraries which sequences are derived frombiodiverse organisms. Thus, biodiverse nucleotide sequences aredesirably employed in the method of the invention to prepare theexpression libraries. Where sequenced genomes or fragments thereof fromdifferent organisms are used in the method each of the genomes orfragments thereof should be provided in equal molar amounts to ensurethat an equal proportion of sequenced genomes or fragments thereof areincluded in the method.

Those working in the field will appreciate that gene fragment expressionlibrary may be prepared using any expression vector known in the art.Preferably the vectors selected for use in the library possess strongpromoters therein enhancing amino acid sequence expression. For example,in a bacterial system bacterial-expressible promoters that may be usedin the vector may include, but would not be limited to, pT7-Select, pET,pZero, pHook, pTYB or a derivative thereof. Other vectors that may beused in the vector are discussed in more detail below.

The amino acid sequence(s) derived from the gene fragment expressionlibrary may be expressed in a conformationally constrained orconformationally unconstrained form. Amino acid sequences that areexpressed in a conformationally constrained form may be expressed withina second polypeptide as a fusion protein such that they are effectively“nested” in the secondary structure of the second polypeptide.Alternatively, the amino acid sequence(s) may be circularised by meansof oxidising flanking cysteine residues to limit conformationaldiversity. This may be particularly beneficial where the amino acidsequence(s) are nested within a surface-exposed or functional site of aprotein, such that they are accessible to the biological activity ofinterest. For example, the amino acid sequence(s) may be expressedwithin a thioredoxin (Trx) polypeptide loop. Whilst not being bound byany theory or mode of action, expression of the amino acid sequence(s)in a conformationally constrained form limits the degrees of freedom andthe entropic cost associated with its binding, imparting a high degreeof affinity and specificity to the interaction.

Those working in the field will appreciate that the present inventionhas broad reaching application. By way of exemplification the presentinvention is particularly useful for screening gene fragment expressionlibraries for amino acid sequence(s) reactive with particular antibodiesby for example affinity chromatography of a phage display library.Alternatively, biodiverse gene fragment libraries may be used toidentify antigenic or immunogenic sequences that may be used forvaccines or for immunotherapy of allergic disease or autoimmune disease.

In one embodiment there is provided a method for identifying a modulatoror mediator of a biological activity, which activity includesantigenicity and or immunogenicity, said method comprising the steps of:

-   (i) producing a gene fragment expression library derived from    defined nucleotide sequence fragments, which nucleotide sequence    encodes at least a sequence of amino acids;-   (ii) assaying the expression library for at least an amino acid    sequence derived from step (i) for a biological activity wherein the    library is adapted to display a range of amino acid sequences each    of which may vary by at least an amino acid; and-   (iii) identifying those amino acids essential for modulating the    biological activity, which activity is different from the activity    which the sequence is not normally associated in its native    environment.

A sequence of amino acids that is particularly affective in modulatingbiological activity can be selected by comparing the observed biologicalactivity from a series of different amino acid sequences of a similarconstitution. Using differences in the observed biological activity itis possible to identify those amino acids essential for biologicalactivity and those which are either desired for the activity or in thealternate case those which are a hindrance to achieving effectiveactivity.

The present invention has broad reaching application for identifyingamino acid sequences that have a novel activity compared to that forwhich they may be recognised as having in their ordinary naturalenvironment.

In a particularly preferred for of this embodiment there is provided amethod for identifying an amino acid sequence which has either antigenicor immunogenic activity, said method comprising the steps of:

-   (i) producing a gene fragment expression library derived from    defined nucleotide sequence fragments, which nucleotide sequence    encodes at least a sequence of amino acids;-   (ii) assaying the expression library for at least an amino acid    sequence derived from step (i) for a antigenic or immunogenic    activity wherein the library is adapted to display a range of amino    acid sequences each of which may vary but at least an amino acid;-   (iii) identifying those amino acid sequences essential for    modulating or mediating the antigenic or immunogenic activity; and-   (iv) selecting those sequences from the identification step in    step (iii) that are not associated the antigenic or immunogenic    activity in their native environment.

Preferably the gene fragment libraries employed in this embodiment ofthe invention are used to identify or produce antigens that can be usedfor vaccines or for immunotherapy of allergic disease or autoimmunedisease. In the case of the allergen immunotherapy it is especiallydesirable that high affinity peptides are identified (which are rarefrom random peptide libraries) because they may be used as monovalentantigens to avoid specific crosslinking immunological reactions such ascrosslinking of IgE on mast cells.

In a second embodiment the peptide libraries of the present inventionmay be employed to identify novel antibacterial amino acid sequencesthat are conditionally released from a fusion protein. According to thisembodiment, there is provided a method of identifying a antibacterialamino acid sequence, comprising:

-   (i) transforming or transfecting a first bacterial population of    cells with a peptide expression library derived from defined    nucleotide sequence fragments;-   (ii) growing said first bacterial population for a time and under    conditions sufficient for expression of the amino acid sequences    encoded within said library to occur and for release of the amino    acid sequences from their cognate fusions;-   (iii) contacting the expressed amino acid sequences with pathogenic    bacteria;-   (iv) identifying those sequence(s) that are capable of inhibiting    the growth of the pathogenic bacteria, or killing the pathogenic    bacteria; and-   (v) selecting those sequences from the identification step in    step (iv) that are not associated with the inhibition of growth of    the pathogenic bacteria, or killing the pathogenic bacteria in their    native environment.

It should be appreciated that the method described in this embodimenthas broad reaching application for identifying novel amino acidsequences that are capable of inhibiting the growth of pathogenicbacteria, or killing pathogenic bacteria.

In a highly preferred form of this embodiment nucleotide sequencesencoding peptide(s) or peptide fusions are inserted within the cloningsite of a T7-Select phage vector (Invitrogen) with or without theintroduction of a conditional protein cleavage site (such as thetemperature sensitive protein splicing element ‘intein’ modified fromthe element found in the Saccharomyces cerevisiae VMA1 gene (e.g. IMPACTT7 system, New England Biolabs)) cloned into the fusion junction of thevector. The first bacterial population is then grown for a time andunder conditions sufficient for expression of the peptides encoded bysaid library to occur. In cases where conditional cleavage of thepeptide from its fusion context is desired (e.g. the intein system), thebacterial/phage population may be put under conditions where cleavagecan occur (e.g. low temperature in the case of the intein mutantcleavage)]. The individual clones or pools of clones in said library arethen separated into replica arrays. At least one of said replicatedarrays is then lysed to produce a lysate array. Note this is notnecessary in the case of lytic phage vectors such as T7-select. Thelysate array is then brought into physical relation with pathogenicbacteria. Those lysates that are capable of inhibiting the growth of thepathogenic bacteria, or killing the pathogenic bacteria can then beidentified by standard techniques.

For convenience, the pathogenic bacterium described in this embodimentmay be contained within a bacterial lawn on solid media, however this isnot essential to the performance of this embodiment.

Preferably, the subject method further comprises the step of keying thelysate back to the replicated array to localise the bacterial cell thatexpresses the same antibacterial peptide as that expressed in saidlysate. More preferably, the genetic sequence encoding the peptide isisolated for the purposes of producing the antibacterial peptide encodedtherefor.

In an exemplification of this embodiment, Escherichia coli BL21 lysatescontaining protein expressed from pET peptide libraries, are assayed fortheir ability to inhibit the growth of pathogenic microorganisms oralternatively, for their ability to kill pathogenic microorganisms,wherein individual clones derived from a population of cells transformedor transfected with the subject peptide library are eitherreplica-plated onto semi-permeable membranes, such as nitrocellulose ornylon membranes, or alternatively, replica-picked, to master culturesand cultures in which expression of the cloned peptide sequence is to beinduced, prior to lysis. Replica-plating and/or replica-picking can beperformed manually or with the assistance of robotics. Samplescomprising those colonies in which expression is to be induced arelysed, for example by exposure to chloroform or by infection with abacteriophage such as T7 bacteriophage, and overlayed on a freshlyseeded lawn of pathogenic bacteria.

In the case of lytic phage libraries (such as those made in theT7-select system), a double-faced petri-dish can be used. In this case aphage overlay occupies one face of the dishes that is separated from theother faces by a supported semi-permeable membrane (made of a materialsuch as nitrocellulose or nylon) on which a seeded lawn of thepathogenic bacteria lies. Thus the semi-permeable membrane separates thephage overlay from the pathogenic bacteria that can be grown ondifferent media respectively (see example).

The ability of individual peptide-expressing clones to inhibit growth orto kill the pathogenic bacterium in question is assayed by detecting thepresence of a ‘plaque-like’ “clearing” or “hole” in the lawn ofpathogenic bacteria directly beneath the position where the lysatecontaining the expressed antibacterial peptide occurs.

Those skilled in the art will recognise that this method provides anopportunity of isolating a phage or plasmid clone expressing theactivity that gave rise to the corresponding hole in the lawn on theopposite.

In a third embodiment, there is provided a method for identifying amodifier of a biological activity associated with a host cell, saidmethod comprising the steps of:

-   (i) Expressing a reporter molecule operably under the control of the    biological activity in the cell, wherein at least a molecule    associated with the biological activity comprises an amino acid    sequence encoded by a nucleotide sequence that is placed operably in    connection with a promoter;-   (ii) Incubating at least a cell from step (i) in the presence of an    amino acid sequence(s) from a gene fragment expression library    derived from a defined genomic sequence, under conditions promoting    interaction between the amino acid sequence(s) and a nucleotide or    amino acid sequence involved with the biological activity;-   (iii) Identifying at least an amino acid sequence that in the    presence of the cells is capable of modifying expression of said    reporter molecule, or the biological activity; and-   (iv) Selecting those sequences in step (iii) that are not generally    recognised as being able to modifying expression of said reporter    molecule, or the biological activity in their native environment.

Preferably the method is repeated as often as is necessary to ensurethat a substantially all of the amino acids encoded by the definednucleotide sequence are presented to the biological activity.

In a particularly preferred form of the third embodiment the genefragment expression library is prepared in a pET vector. Such as thosethat are commercially available from Novagen. pET vectors as describedherein are particularly useful in such applications, by virtue of thestrong T7 promoter sequence contained therein which facilitatesbacterial expression in strains expressing T7 polymerase. Those skilledin the art will appreciate that other bacterial expression vectors willbe equally applicable.

In a highly preferred form of this embodiment, a nucleotide sequence(s)derived from a defined genetic sequence is incorporated into a pETvector such that the nucleotide sequence is operably linked to anappropriate bacterial translation initiation sequence as describedsupra. A second nucleotide sequence may further be expressed inassociation with the first nucleotide sequence such that the resultantpeptide is constrained within the active site loop of thioredoxin orwithin oxidised flanking cysteine residues. As with other embodiments ofthe invention, the second nucleotide sequence may be synthetic and/orderived from genomic sources.

Expression from the pET vector is achieved by infection of bacteriawhich contain the library plasmid with bacteriophage T7 oralternatively, by using publicly available strains such as E. coli BL21,which contain the T7 polymerase gene under lac control, because in suchstrains IPTG may be added to growth media to induce expression of the T7polymerase gene. Derivatives of the strain BL21 (such as strain BL21trxB(DE3), which contain a mutation in the thioredoxin reductase gene trxB,are particularly useful for ensuring that disulphide bonds remainoxidised in the bacterial cytoplasm.

This embodiment is particularly useful for identifying antagonists of abiological activity. In such situations, the undesirable biologicalactivity is preferably functional in the absence of the drug beingscreened and perturbation of that interaction is assayed in the presenceof a candidate drug compound, wherein modified reporter gene expressionis detected in the manner described for other embodiments of theinvention.

Preferably, where the reporter molecule is lethal to the bacterial cell,expression thereof should not occur until the amino acid sequence(s)candidate compound is provided to the cell for a time and underconditions sufficient to antagonise the biological activity leading toreporter expression. Accordingly in a preferred form this embodimentprovides a method of identifying an antagonist of a biological activityin a bacterial cell, comprising the steps of:

-   (i) placing the expression of a cytostatic or cytotoxic reporter    molecule operably under the control of a biological activity in said    cell, wherein at least one binding partner in said biological    activity comprises an amino acid sequence encoded by a nucleotide    sequence that is placed operably in connection with a    bacterially-expressible promoter;-   (ii) incubating the cell in the presence of at least an amino acid    sequence candidate compound to be tested for its ability to    antagonise the biological activity for a time and under conditions    sufficient for antagonism to occur, wherein the amino acid sequence    candidate compound is derived from a gene fragment expression    library derived from a defined genomic sequence;-   (iii) expressing the binding partner under the control of the    bacterially expressible promoter for a time and under conditions    sufficient to result in expression of the reporter molecule in the    absence of antagonism; and-   (iv) selecting surviving or growing cells.

Preferably, the inducible bacterially-expressible promoter is the T7promoter. In such circumstances, the expression of the reporter moleculemay be induced by infecting cells with bacteriophage T7, which suppliesthe T7 polymerase function. Alternatively, the bacterial cell may be acell that contains the T7 polymerase under lac control (e.g. E. coliBL21 cells), in which case the promoter may be induced by the additionof IPTG to growth medium. The candidate compound may be any smallmolecule, drug, antibiotic or other compound, the only requirement beingthat it is capable of permeating or being actively taken up by thebacterial cell or alternatively, is modified by the addition of acarrier molecule to facilitate such uptake.

In a fourth embodiment there is provided a method of identifying anantagonist of a biological activity, said method comprising the stepsof:

-   (i) placing expression of a reporter molecule operably under the    control of a biological activity in a cell, wherein at least one    partner of said biological activity comprises an amino acid sequence    encoded by a nucleotide sequence that is placed operably in    connection with bacterial-expressible promoter in a suitable vector,    wherein (a) the nucleotide sequence is derived from a nucleotide    sequence of known and sequenced origin and (b) the biological    activity is different from any activity that the amino acid sequence    may have in its native environment;-   (ii) incubating the cell in the presence of a candidate compound to    be tested for the ability to antagonise the biological activity; and-   (iii) selecting cells wherein expression of said reporter molecule,    or biological activity, is modified.

This method is particularly useful for identifying movel drugs such asantibiotics or inhibitory agents that may serve as candidate agonistsand antagonists of any biological activity. Moreover this system may beused in high through-put screening for novel antibiotics or otherinhibitory agents which target specific amino acid sequence:nucleic acidsequence interactions or amino acid sequence:amino acid sequenceinteractions.

Preferably, where the reporter molecule is lethal to the bacterial cell,expression thereof should not be allowed until the candidate compound isprovided to the cell for a time and under conditions sufficient toantagonise the biological activity leading to reporter expression.Accordingly, a preferred aspect of this embodiment provides a method ofidentifying an antagonist of a biological activity in a bacterial cell,comprising:

-   (i) placing the expression of a cytostatic or cytotoxic reporter    molecule operably under the control of a biological activity in said    cell, wherein at least one binding partner is said biological    activity comprises an amino acid sequence encoded by a nucleotide    sequence that is placed operably in connection with a    bacterially-expressible promoter, wherein (a) the nucleotide    sequence is defined and is derived from a nucleotide sequence of    known origin and (b) the biological activity is different from any    activity the amino acid sequence may have in its native environment;-   (ii) incubating the cell in the presence of a candidate compound to    be tested for its ability to antagonise the biological activity for    a time and under conditions sufficient for antagonism to occur;-   (iii) expressing of the binding partner under the control of the    bacterially expressible promoter for a time and under conditions    sufficient to result in expression of the reporter molecule in the    absence of antagonism; and-   (iv) selecting surviving or growing cells.

In a highly preferred example of this embodiment, the induciblebacterially expressible promoter is the T7 promoter. A person skilledthe field will observe that any other bacterial inducible promoter maybe used in the invention. This embodiment is only being exemplified inrelation to the promoter for convenience. In such circumstances, theexpression of the reporter molecule may be induced by infecting cellswith bacteriophage T7, which supplies the T7 polymerase function.Alternatively, the bacterial cell may be a cell which contains the T7polymerase under lac control (e.g. E. coli BL21 cells), in which casethe promoter may be induced by the addition of IPTG to growth medium.The candidate compound may be any small molecule, drug, antibiotic orother compound, the only requirement being that it is capable ofpermeating or being actively taken up by the bacterial cell oralternatively, is modified by the addition of a carrier molecule tofacilitate such uptake.

Desirably, the present invention employs a gene fragment expressionlibrary made from defined genomic sequence present either in isolationor in combination with other defined genomic sequence to identify aminoacid sequence(s) that may be suitable candidates for rational drugdesign while at substantially the same time providing comprehensivebioinformatic data about those candidates. The bioinformatic dataderived from the method may be used to identify those amino acidsimportant in modulating the biological activity.

Using knowledge of the phylogenetic relationship between microorganisms,a mixture of particular genomes can be designed to maximise the sequencediversity in the peptide expression library. This approach has severaldistinct advantages over cloning and expressing DNA purified directlyfrom the environment. First, the true diversity and bias of the librarycan be more easily approximated. Hence measures can be implemented tomaximise the domain diversity and to minimise bias towards the genomesof dominant species. Second, artificially pooling DNA derived fromdistinct known organisms allows unique opportunities to survey diversegenomes that may not occur together in nature. For example, the genomesof certain archaebacteria could be simultaneously screened with those ofobligate parasites such as mycoplasmas and/or diverse gram positiveand/or gram negative organisms. Third, the alignment of sequencesderived from a screen can be used to reveal consensus motifs. Moreover,other potential related motifs can be excluded as potential drugcandidates if they are not identified from any of the genomes in whichthey theoretically occur, despite exhaustive screening at a complexitythat would be predicted to cover all of the potential domains encoded bythe genome/s yet failed to exhibit the required activity. Thisinformation can be used to design optimal peptides that mimic theconsensus motifs identified in the biological screen while lackingalternative residues of structurally related peptides that werepresumably included in the exhaustive screen. Finally, using the pooledgenomes of sequenced organisms facilitates certain powerfulbioinformatic analyses that may be useful in the design of therapeuticpeptides.

In a fifth embodiment there is provided a method for identifying amodulator of a biological activity, said method comprising the steps of:

-   (i) producing an gene fragment expression library derived from a    defined genomic sequence;-   (ii) contacting an amino acid sequence derived from the expression    library with a reporter molecule that is operably under the control    of a biological activity associated with a host; and-   (iii) identifying an amino acid sequence capable of modulating the    biological activity wherein that activity is different from any    activity the amino acid sequence may have in its native environment.

Preferably, at least one of the partners in the biological activitycontemplated by this embodiment is a peptide, polypeptide, protein orenzyme molecule or a derivative thereof. According to this embodiment,the remaining partner(s) is (are) a molecule selected from the listcomprising nucleic acid such as single-stranded or double-stranded RNAor DNA, a peptide, polypeptide, protein, enzyme, carbohydrate, aminoacid, nucleotide, nucleoside, lipid, lipoprotein, vitamin, co-enzyme,receptor molecule, hormone, chemical compound, cyclic AMP, metal ion orsecond messenger molecule, amongst others. More preferably, thebiological activity is a protein:protein interaction or aprotein:peptide interaction or a protein:polypeptide interaction.

In a particularly preferred form, the biological activity is between afirst partner comprising an amino acid sequence and a second partner,comprising a nucleic acid molecule such as DNA or RNA or alternatively,an amino acid sequence or a derivative or analogue thereof.

According to a sixth embodiment, there is provided a method foridentifying an amino acid sequence that is capable of modulating abiological activity in a host cell, said method comprising the steps of:

-   (i) producing a library in a host wherein (a) the transformed cells    of said library contain at least a first nucleotide sequence that    comprises or encodes a reporter molecule the expression of which is    operably under control of said biological activity and a second    nucleotide sequence derived from a known genomic sequence that is    capable of encoding the amino acid sequence when placed operably    under the control of a suitable promoter sequence and wherein (b)    substantially all of the known genomic sequence is present within    the population of transformed cells making up said library and the    biological activity is different from any activity the amino acid    sequence may have in its native environment;-   (ii) culturing said cellular host for a time and/or under conditions    sufficient for expression of said second nucleotide sequence to    occur; and-   (iii) selecting or screening for cells wherein expression of said    reporter molecule is modified.

The second nucleotide sequence used in the method may be derived fromany known genomic sequence. By using a sufficient number of secondnucleotide species to ensure that the entire sequence of the knowngenomic sequence is assayed bioinformatic data can be gathered fromsequences which not only gave a positive result in the test system butalso those sequences which failed to react. By comparing reactive aminoacid sequences against similar sequences in a genome that either causeda reaction or alternatively failed to cause a reaction, sequence motifsas well as individual amino acids can be identified that may beimplicated in a biological activity. In addition, if the screen issufficiently comprehensive to ensure adequate coverage, certainalternative residues/motifs represented in the library can be shown tobe suboptimal if incorporated into the design of inhibitors of theactivity.

Thus, in a preferred form this embodiment provides a method ofidentifying a amino acid sequence(s) that is capable of modulating abiological activity in a host said method comprising the steps of:

-   (i) producing a peptide library in a host wherein (a) the    transformed cells of said library contain at least a first    nucleotide sequence which comprises or encodes a reporter molecule    the expression is which is operably under control of said biological    activity and a second nucleotide sequence derived from a known    genomic sequence which is capable of encoding said amino acid    sequence(s) when placed operably under the control of a suitable    promoter sequence and wherein (b) substantially all of the known    genomic sequence is present within the population of transformed    cells making up said library;-   (ii) culturing said cellular host for a time and/or under conditions    sufficient for expression of said second nucleotide sequence to    occur;-   (iii) selecting or screening for cells wherein expression of said    reporter molecule is modified;-   (iv) comparing the range of amino acid sequences that can be derived    from the known genomic sequence against those sequences which    modulated biological activity; and-   (v) determining those amino acids which are essential for modifying    the reporter molecule activity.

In another embodiment the present invention therefore provides a vectorcapable of expressing a nucleotide sequence in each of its possiblereading frames and wherein each of the amino acid sequences so producedare expressed as a fusion with a second amino acid sequence in whichthey may be conformationally constrained, wherein said vector at leastcomprises:

-   (i) a first expression cassette, comprising:    -   (a) a multiple cloning site for insertion of a first nucleotide        sequence encoding said first amino acid sequence, wherein said        multiple cloning site may be adjacent to one or more second        nucleotide sequences encoding a polypeptide loop such that a        fusion polypeptide is capable of being produced between said        first and second amino acid sequences;    -   (b) a terminator sequence adjacent to the multiple cloning site        and distal to said promoter sequence and second nucleotide        sequences;-   (ii) a means for expressing the first nucleotide sequence in each of    its reading frames;-   (iii) a bacterial origin of replication and/or a bacteriophage    origin of replication; and-   (iv) a second expression cassette encoding a bacterial selection    marker gene.

In an alternative embodiment, the expression vector of the inventionfurther comprises a second expression cassette comprising a selectablemarker gene operably linked to two or more promoter sequences and placedupstream of a terminator sequence, wherein one of said promotersequences is a bacterially-expressible promoter and wherein one of saidpromoter sequences is a yeast-expressible promoter.

In another alternative embodiment, the subject vector is furthermodified to provide for the inducible extracellular expression by meansof signal peptide fusions and/or conditional lysis systems. Conditionallysis may be achieved by expression of an inducible lytic gene inbacterial cells, by introducing such sequences into an expressioncassette between an inducible bacterial promoter (such as the lac, tacor the more tightly regulated araBAD promoters) and a transcriptionaltermination sequence, in tandem array with the promoter and terminatorsequences already present in the subject expression cassettes.

In a still further embodiment, the conditional lysis of bacteriaexpressing the said peptide/polypeptide, is brought about by alternativemeans such as by infection with a suitable bacteriophage or by exposureto appropriate chemical agents such as chloroform and/or SDS. In aparticularly preferred form of the invention the vector also includes athird expression cassette allowing conditional expression of a lyticgene (such as those genes produced by bacteriophages).

The present invention also contemplates amino acid sequence(s)identified by the method of the present invention as well as use ofthose molecules in a pharmaceutical composition. The pharmaceuticalcomposition comprising an amino acid sequence(s) capable of modulating abiological activity or the function of a biological molecule and apharmaceutically acceptable carrier and/or diluent.

Biodiverse Nucleotide Sequence Fragments

Where sequenced genomes from different organisms are used in the aboveembodiments each of the genomes should be provided in equal molaramounts to ensure that an equal proportion of sequenced genomes areincluded in the method. Because the genomes are of a known size,standard normalisation methods can be applied to ensure that theconcentration of one organism's genome is not proportionally greaterthan that of another organism's genome. Such methods for equalisinggenomic concentrations are well known to those skilled in the art andinclude, by way of example, the contribution of proportionately more DNAto the pool from the genomes which are larger, to compensate for thetendency for fragments from such genomes to be under represented if anequal mass of DNA from each genome is combined. In addition,normalisation by other means known to those skilled in the art such asdisclosed in U.S. Pat. No. 5,763,239 is contemplated by the presentinvention.

The present invention attempts to accelerate the evolutionary process byartificially combining domains from different genomes that would havebeen unlikely to co-evolve. Preferably, the genomic expression librariesare prepared from evolutionary diverse organisms. For example, theorganisms could be either derived from: compact eukaryotic genomes suchas Fugu rubripes, Caenorhabditis elegans, Saccharomyces cerevisiae; andor from prokaryotic microorganisms that have been characterisedgenetically such as, E. coli, Aquifex aelitcus, Methanococcusjannaschii, Bacillus subtilis, Haemophilus influenzae, Helicobacterpylori, Neisseria meningiditis, Synechocystis sp Bordetella pertussis,Pasteurella multocida, Pseudomonas aeruginosa, Borrelia burgdorferi,Methanobacterium thermoautotrophicum, Mycoplasma pneumoniae,Archaeoglobus fulgidis and Vibrio harveyi). Those skilled in the art areaware that the number of sequenced genomes is increasing rapidly(compilations of sequenced genomes can be readily obtained by referenceto the World Wide Web (eg. see the following URLs for details:

http://www.tigr.org/tdb/mdb/html

http://www.sanger.ac.uk

http://www.genome.ad.jp/kegg/java/org_proj.html

http://www-fp.mcs.anl.gov/˜gaasterland/genomes.html

http://www.ncgr.org/http://www.cbs.dtu.dk/databases/DOGS/index.html

http://geta.life.uiuc.edu/˜nikos/genomes.html

and that the methods described here are applicable to any subset of theentire pool of sequenced genomes.

The defined nucleotide sequence from which the known nucleotide sequenceis derived is not limited only to those sequences that encode aminoacids in naturally derived proteins, but also include non-codingnucleotide sequences. Thus, it should be understood that the secondnucleotide sequence may be derived from a 5′ UTR, an intron (whereapplicable), a 3′ UTR, or alternative reading frames/orientations of thecloned fragment.

Diversity within a pool of sequenced nucleotide sequences may also beexpanded by subjecting the sequences to methods that mis-read or mutatethose fragments. Thus, in an embodiment of the invention the method mayalso include a step of artificially mutating the domain libraries. Suchmethods are well known in the art.

One ways to achieve this end would involve mutation of the known genomicsequence prior to insertion into an expression vector. Thus, in onepreferred form, the method of the invention might include the step of:subjecting the known nucleotide sequence to mutagenesis prior toinsertion into the expression vector. This may be achieved for exampleby amplifying the sequenced genomes using mutagenic PCR procedures suchas those that include the step of performing the PCR reaction in thepresence of manganese. It has been calculated with an error rate of 0.5bases per 100 bp/cycle that eight mutagenic cycles will produce basechanges in 90% of the PCR products and almost 50% will have 2 or 3substitutions.

Another way in which the domain libraries might be mutated would bethrough expression of nucleotide fragments in cells that are modified tomutate sequence information. Such strains are deficient in certainenzymes making their mutation rate approximately 5,000 to 10,000 timeshigher than in the wild-type parent. Thus, the method may include thestep of: expressing the biodiverse gene fragments in one or more celllines that are deficient in at least a DNA repair enzyme. For example,once constructed, the plasmid library can be amplified in bacterialstrains deficient in mismatch repair (e.g. strains containing the mutS,mutD and/or mutT mutation), resulting in the generation of mutations. Inone exemplification of this embodiment, peptide libraries derived fromthe expression of genomic DNA are amplified or propagated in bacterialstrains which are defective in the epsilon (c) subunit of DNA polymeraseIII (i.e. dnaQ and mutD alleles) and/or are defective in mismatchrepair. Escherichia coli mutator strains possessing the mutY and/or mutMand/or mutD and/or mutT and/or mutA and/or mutC and/or mutS alleles areparticularly useful for such applications. Bacterial strains carryingsuch mutations are readily available to those skilled in the art.

Where fragments are mutated prior to generation of an expression libraryboth mutated and unmutated fragments should preferably be combined inthe same preparation and are preferably expressed using vectorsdescribed herein. The mutated and unmutated libraries will undergo thesame selection procedures. The specificity and biological activity ofthe peptides should then be compared and examined.

To enhance diversity within the sequenced genomic peptide library thefragmented sequenced genomes may also be expressed in each of theirdifferent reading frames. Expression of such sequences in this mannermay be achieved by any method known in the art including for example byligating the fragments to adaptors and/or linkers in the three differentreading frames or by placing the fragments under the control of internalribosome entry site/s (IRES) and/or sequences conferringtranscriptional/translational slippage. If adaptors are used, a singlevector may contain each of the different adaptors or each adaptor may beprovided in a different vector.

The fragments may also be expressed in the reverse reading frames. Thus,allowing for expression of a gene sequence in each possible readingframe, for any particular peptide sequence there will be six differentpossible combinations.

The presence of clones in all reading frames allows the simultaneousscreening of random peptides expressed in reading frames that do notoccur in nature, together with a variety of natural peptide domainscloned in the appropriate reading frame. This allows a comparison of therelative success of isolation inhibitors from natural peptide librariesas opposed to random peptide libraries. The screening methods describedherein are also applicable to the screening of libraries of constrainedor unconstrained random peptides derived from artificial, non-biologicalsources.

DEFINITIONS

As used here in the phrase “not normally associated in its nativeenvironment” shall refer to an activity that the amino acid sequence isnot typically associated with. Further, as used herein “nativeenvironment” shall be understood to refer to the biological environmentin with the amino acid sequence is typically found in nature.

As used herein, the term ‘domain’ shall be taken to mean a functionalunit of an amino acid sequence(s) possessing activity in isolation or inan artificial context and does not necessarily imply any structuralfeatures.

As used herein ‘amino acid sequence’ shall include peptides,oligopeptides and polypeptides including derivatives and analoguesthereof being comprised of a number of residues ranging from 1 to 500.

As used herein, the term ‘aptamer’ shall be taken to include the highlyspecific, normally conformationally constrained peptides related to theclass described by Brent and colleagues (3).

As used herein, the term ‘activity’ shall be taken to include anyenzymatic activity, structural or conformational change occurringoutside or inside the cell.

As used herein, the term ‘gene fragment expression library’ shall betaken to include any expression libraries made using inserts derivedfrom genomic fragments or PCR products of a range of distinctprokaryotic genomes and/or compact eukaryotic genomes.

As used herein the term “derivative” shall be taken to refer to mutants,parts or fragments of a complete polypeptide as defined herein which arefunctionally equivalent. Derivatives include modified peptides in whichligands are attached to one or more of the amino acid residues containedtherein, such as functional groups, carbohydrates, enzymes, proteins,polypeptides or reporter molecules such as radionuclides or fluorescentcompounds. Glycosylated, fluorescent, acylated or alkylated forms of thesubject peptides are also contemplated by the present invention.Procedures for derivatizing proteins and peptides are well known in theart.

“Analogues” of a peptide, protein, polypeptide or enzymes arefunctionally equivalent molecules that comprise one or morenon-naturally occurring amino acid analogues known to those skilled inthe art.

The terms “host” and “cellular host” or similar term refer toprokaryotic and eukaryotic cells capable of supporting the expression ofa reporter molecule under the control of a biological activity,irrespective of whether or not the biological activity or the reportermolecule is endogenous to the cell.

Those skilled in the art will be aware that a “transformed cell” is acell into which exogenous nucleic acid has been introduced, wherein theexogenous nucleic acid is either integrated into the host cell genome oralternatively, maintained therein as an extra chromosomal geneticelement such as a plasmid, episome or artificial chromosome, amongstothers.

The transformed cell of the present invention may be any cell capable ofsupporting the expression of exogenous DNA, such as a bacterial cell,insect cell, yeast cell, mammalian cell or plant cell. In a particularlypreferred embodiment of the invention, the cell is a bacterial cell,mammalian cell or a yeast cell. In a particularly preferred embodimentof the invention, the cell is a yeast cell.

The term “expression” refers at least to the transcription of anucleotide sequence to produce an RNA molecule. The term “expression mayalso refer to the combined transcription and translation of a nucleotidesequence to produce a peptide, polypeptide, protein or enzyme moleculeor alternatively, to the process of translation of mRNA to produce apeptide, polypeptide, protein or enzyme molecule.

By “operably under control” is meant that a stated first integer isregulated or controlled by a stated second integer.

In the present context, where the expression of the reporter molecule isoperably under control of a biological activity, said expression ismodified (i.e. enhanced, induced, activated, decreased or repressed)when a peptide, oligopeptide or polypeptide capable of enhancing,inducing, activating, decreasing or repressing the formation of saidbiological activity is expressed. Accordingly, it is not usuallysufficient for only one partner in the biological activity to be presentfor such modified expression of the reporter molecule to occur however,there may be some expression of the reporter molecule in the presence ofonly one partner.

As used herein, the term “peptide library” is a set of diversenucleotide sequences encoding a set of amino acid sequences, whereinsaid nucleotide sequences are preferably contained within a suitableplasmid, cosmid, bacteriophage or virus vector molecule which issuitable for maintenance and/or replication in a cellular host. The term“peptide library” further encompasses random amino acid sequencesderived from a known genomic sequence, wherein the amino acid sequencesare encoded by a second nucleotide sequence obtained for example byshearing or partial digestion of genomic DNA using restrictionendonucleases or nucleases such as Dnase1, amongst other approaches.

Preferred peptide libraries according to this embodiment of theinvention are “representative libraries”, comprising a set of amino acidsequences or nucleotide sequences encoding same, which includesvirtually all possible combinations of amino acid or nucleotidesequences for a previously defined and specified length of peptide ornucleic acid molecule, respectively.

In a particularly preferred embodiment of the invention, the peptidelibrary comprises cells, virus particles or bacteriophage particlescomprising a diverse set of nucleotide sequences which encode a diverseset of amino acid sequences, wherein the member of said diverse set ofnucleotide sequences are placed operably under the control of a promotersequence which is capable of directing the expression of said nucleotidesequence in said cell, virus particle or bacteriophage particle.

Accordingly, the amino acid sequence encoded by the second nucleotidesequence may comprise any sequence of amino acids of at least about 1 to100 amino acids in length and preferably 1 to 60 amino acids in lengthand may be derived from the expression of known nucleotide sequenceswhich are prepared by any one of a variety of methods such as, forexample, random synthetic generation. More preferably, the peptide unitis a 6 to 20 amino acid peptide. The use of larger nucleotide fragments,particularly employing randomly sheared nucleic acid derived frombacterial, yeast or animal genomes, is not excluded.

Alternatively or in addition, the amino acid sequence may be expressedas a fusion protein with a nuclear targeting motif capable offacilitating targeting of said peptide to the nucleus of said host cellwhere transcription occurs, in particular the SV40 nuclear localisationsignal which is functional in yeast and mammalian cells.

Alternatively, or in addition, the amino acid sequence may be expressedas a fusion protein with a peptide sequence capable of enhancing,increasing or assisting penetration or uptake of the peptide by anisolated cell such as when the subject amino acid sequence issynthesized ex vivo and added to isolated cells in culture. In aparticularly preferred embodiment, the peptide sequence capable ofenhancing, increasing or assisting penetration or uptake is functionalin higher eukaryotic cells; for example the Drosophila penetratintargeting sequence. According to this embodiment, the fusion protein atleast comprises the amino acid sequence:

CysArgGlnIleLysIleTrpPheGlnAsnArgArgMetLysTrpLys Lys(Xaa)_(n)Cysor a homologue, derivative or analogue thereof, wherein Xaa is any aminoacid residue and n has a value grater than or equal to 1. Preferably,the value of n will be at least 5, more preferably between about 5 andabout 20, even more preferably between about 15 and about 35 and stilleven more preferably between about 30 and about 50 and still morepreferably between about 35 and about 55. In a still more preferredembodiment, the value of n is between at least about 40 and at leastabout 60.

Reference herein to a “promoter” is to be taken in its broadest contextand includes the transcriptional regulatory sequences of a classicalgenomic gene, including the TATA box which is required for accuratetranscription initiation in eukaryotic cells, with or without a CCAATbox sequence and additional regulatory elements (i.e. upstreamactivating sequences, enhancers and silencers). Promoters may also belacking a TATA box motif, however comprise one or more “initiatorelements” or, as in the case of yeast-derived promoter sequences,comprise one or more “upstream activator sequences” or “UAS” elements.For expression in prokaryotic cells such as bacteria, the promotershould at least contain the −35 box and −10 box sequences.

A promoter is usually, positioned upstream or 5′, of a structural gene,the expression of which it regulates. Furthermore, the regulatoryelements comprising a promoter are usually positioned within 2 kb of thestart site of transcription of the gene.

In the present context, the term “promoter” is also used to describe asynthetic or fusion molecule, or derivative that confers, activates orenhances expression of the subject reporter molecule in a cell.Preferred promoters may contain additional copies of one or morespecific regulatory elements, to further enhance expression of the geneand/or to alter the spatial expression and/or temporal expression. Forexample, in yeast regulatory elements that confer galactose, phosphateor copper inducibility may be placed adjacent to a heterologous promotersequence driving expression of the reporter, thereby conferringconditional inducibility on the expression of said gene by the additionof the appropriate inducer to the growth medium.

Placing a gene operably under the control of a promoter sequence meanspositioning the said gene such that its expression is controlled by thepromoter sequence. Promoters are generally position 5′ (upstream) to thegenes that they control. In the construction of heterologouspromoter/structural gene combinations it is generally preferred toposition the promoter at a distance from the gene transcription startsite that is approximately the same as the distance between thatpromoter and the gene it controls in its natural setting, i.e., the genefrom which the promoter is derived. As is known in the art, somevariation in this distance can be accommodated without loss of promoterfunction. Similarly, the preferred positioning of a regulatory sequenceelement with respect to a heterologous gene to be placed under itscontrol is defined by the positioning of the element in its naturalsetting, ie. the genes from which it is derived. Again, as is known inthe art, some variation in this distance can also occur.

Examples of promoters suitable for use in regulating the expression ofthe reporter molecule and/or amino acid sequence and/or the polypeptidebinding partner in a cell include viral, fungal, yeast, insect, animaland plant derived promoters. Preferred promoters are capable ofconferring expression in a eukaryotic cell, especially a yeast ormammalian cell. The promoter may regulate the expression of a geneconstitutively, or differentially with respect to the tissue in whichexpression occurs or, with respect to the developmental stage at whichexpression occurs, or in response to external stimuli such asenvironmental stress, or hormones amongst others.

Particularly preferred promoters according to the present inventioninclude those naturally-occurring and synthetic promoters which containbinding sites for transcription factors, more preferably forhelix-loop-helix (HLH) transcription factors, zinc finger proteins,leucine zipper proteins and the like. Preferred promoters may also besynthetic sequences comprising one or more upstream operator sequencessuch as LexA operator sequences or activating sequences derived from anyof the promoters referred to herein such as GAL4 DNA binding sites.

Those skilled in the art will recognise that the choice of promoter willdepend upon the nature of the cell being transformed and the molecule tobe expressed. Such persons will be readily capable of determiningfunctional combinations of minimum promoter sequences and operators forcell types in which the inventive method is performed.

In a particularly preferred embodiment, the promoter is a yeastpromoter, mammalian promoter, a bacterial or bacteriophage promotersequence selected from the list comprising GAL1, CUP1, PGK1, ADH2, PHO5,PRB1, GUT1, SP013, ADH1, CMV, SV401 T7, SP6, lac or tac promotersequences.

Whilst the invention is preferably performed in yeast cells, theinventors clearly contemplate modifications wherein the invention isperformed entirely in mammalian cells, utilising promoters that areoperable in mammalian cells to drive expression of the various assaycomponents, in combination with a counter selective reporter geneoperable in mammalian cells. Such embodiments are within the ken ofthose skilled in the art.

For expression in mammalian cells, it is preferred that the promoter isthe CMV promoter sequence, more preferably the CMV-IE promoter oralternatively, the SV40 promoter and, in particular, the SV40 latepromoter sequence. These and other promoter sequences suitable forexpression of genes in mammalian cells are well known in the art.

Examples of mammalian cells contemplated herein to be suitable forexpression include COS, VERO, HeLa, mouse C127, Chinese hamster ovary(CHO), WI-38, baby hamster kidney (BHK) or MDCK cell lines, amongstothers. A wide variety of cell lines such as these are readily availableto those skilled in the art.

The prerequisite for producing intact polypeptides in bacterial cellsand, in particular, in Escherichia coli cells, is the use of a strongpromoter with an effective ribosome binding site, such as aShine-Dalgarno sequence, which may be incorporated into expressionvectors carrying the first and second nucleotide sequences, or othergenetic constructs used in performing the various alternativeembodiments of the invention. Typical promoters suitable for expressionin bacterial cells such as E. coli include, but are not limited to, thelacZ promoter, temperature-sensitive λ_(L) or λ_(K) promoters, SP6, T3or T7 promoter or composite promoters such as the IPTG-inducible tacpromoter. A number of other vector systems for expressing the nucleicacid molecule of the invention in E. coli are well known in the art andare described for example in Ausubel et al (1987) and/or Sambrook et al(1989). Numerous sources of genetic sequences suitable for expression inbacteria are also publicly available in various plasmid constructs, suchas for]example, pkC30 (λ_(L)), PKK173-3 (tac), pET-3 (T7) or the pQEseries of expression vectors, amongst others.

Suitable prokaryotic cells for expression include Staphylococcus,Corynebacterium, Salmonella, Escherichia coli, Bacillus sp. andPseudomonas sp, amongst others. Bacterial strains that are suitable forthe present purpose are well known in the relevant art.

Where the promoter is intended to regulate expression of the reportermolecule, it is particularly preferred that said promoter include one ormore recognition sequences for the binding of a DNA binding domainderived from a transcription factor, for example a GAL4 binding site orLexA operator sequence.

As used herein, the term “reporter molecule” shall be taken to refer toany molecule that is capable of producing an identifiable or detectableresult.

In one embodiment of the invention, the reporter molecule is an enzyme,peptide, oligopeptide or polypeptide that comprises a visible product orat least, when incubated in the presence of a substrate molecule canconvert said substrate to a visible product, such that cells expressingthe reporter molecule may be readily detected. For example, theexpression of reporter genes that encode polypeptides, which themselvesfluoresce, or cause fluorescence of a second molecule, can be operablyconnected to the biological activity being assay, to facilitate thedetection of cells wherein expression of the reporter molecule ispresent or absent. Such applications are particularly useful in highthroughput drug screening approaches, wherein it is desirable to rapidlyscreen a large number of drug candidates for their agonist/antagonistproperties with respect to the biological activity in question.Preferred reporter molecules according to this embodiment include, butare not limited to, the Escherichia coli β-galactosidase enzyme, thefirefly luciferase protein and the green fluorescent protein or mutantsthereof which possess red-shifted or blue-shifted emission spectra orenhanced output. Persons skilled in the art will be aware of how toutilise genetic sequences encoding such reporter molecules in performingthe invention described herein, without undue experimentation. Forexample, the coding sequence of the gene encoding such a reportermolecule may be modified for use in the cell line of interest (eg. humancells, yeast cells) in accordance with known codon usage preferences.Additionally, the translational efficiency of mRNA derived fromnon-eukaryotic sources may be improved by mutating the correspondinggene sequence or otherwise introducing to said gene sequence a Kozakconsensus translation initiation site.

Preferably, the reporter molecule allows colorometric identification ofits expression either by direct fluorescence (eg. Green FluorescentProtein) or by a change in colour in the presence of an appropriatesubstrate (eg. the production of a blue colour with β-galactosidase inthe presence of the substrate 5-bromo-4-chloro-3-indoyl-β-D-galacotside(ie. X-GAL).

Particularly preferred reporter molecules according to the presentinvention are those which produce altered cell growth or viability,including the ability to induce cell death. In the present context, thereporter molecule either comprises the first nucleic acid molecule or isencoded by said first nucleic acid molecule. Accordingly, those skilledin the art will be aware that the reporter molecule of such anembodiment is preferably a peptide, polypeptide, enzyme, abzyme or otherprotein molecule or alternatively, an isolated nucleic acid molecule.

Preferably, the reporter molecule of the invention is capable ofdirectly or indirectly inhibiting, enhancing or otherwise modulating thegrowth and/or viability of the host cell. Direct modulation of cellgrowth and/or viability is where expression of the reporter molecule hasa direct consequence on cell growth and/or viability. Indirectmodulation of cell growth and/or viability is where expression of thereporter molecule has no direct consequence on cell growth and/orviability, however, said expression may modulate cell growth and/orviability when cells are cultured in the presence of a suitableco-factor or substrate molecule, amongst others.

Where the reporter molecule is a peptide, polypeptide, enzyme, abzyme orother protein molecule which comprises a cytostatic compound,anti-mitotic compound, toxin, mitogen or growth regulatory substancesuch as a hormone or protein which is essential to cell growth orviability, it may have a direct effect on cell growth or viability whenexpressed therein. Similarly, a reporter molecule which comprises anucleic acid molecule may have a direct effect on cell growth and/orviability, for example wherein the reporter molecule is a ribozyme,antisense molecule, minizyme, or co-suppression molecule which istargeted to the expression of a gene which is capable of modifying cellgrowth and/or viability.

Wherein it is desirable for the reporter molecule to have an indirecteffect on cell growth and/or viability, this may be achieved, forexample by coupling expression of the reporter molecule to theproduction of a cytostatic compound, anti-mitotic compound, toxin ornegative growth regulatory molecule.

In one embodiment, the reporter molecule is an enzyme which, whenexpressed in the host cell, catalyses the conversion of a substratemolecule which is not capable of altering or affecting cell growthand/or viability, to produce a product which comprises a toxin,cytostatic compound or anti-mitotic compound. According to thisembodiment, the expression of the reporter molecule in the presence ofsaid substrate leads to production of a sufficiently high concentrationof the toxin, cytostatic compound or anti-mitotic compound to reducecell growth or result in cell death.

In a further embodiment, the reporter molecule is an enzyme which, whenexpressed in the host cell, catalyses the conversion of a cytostatic oranti-mitotic substrate molecule to produce a product which is incapableof modifying cell growth and/or viability. According to this embodiment,cells incubated in the presence of the substrate molecule do not grow ordivide as rapidly as cells that are not incubated therewith. Whereincells incubated in the presence of the cytostatic or anti-mitoticsubstrate molecule express the reporter molecule, cell division and/orcell growth is resumed when the concentration of said substrate in saidcell is reduced.

In an alternative embodiment, the reporter molecule directly orindirectly enhances cell growth and/or viability, for example bycoupling expression of the reporter molecule to the production of amitogen or positive growth regulatory molecule.

In a further embodiment, the reporter molecule is an enzyme which, whenexpressed in the host cell, catalyses the conversion of a first compoundwhich is inactive in modulating cell growth and/or viability to producea mitogen or positive growth regulatory molecule product. According tothis embodiment, cells incubated in the presence of the substratemolecule grow or divide at a normal rate compared to other cells.Expression of the enzyme reporter molecule in the presence of thesubstrate molecule leads to enhanced cell growth and/or cell division asthe concentration of the mitogen or positive growth regulatory moleculeis increased in the cell. As a consequence, cells in which the reportermolecule is enhanced as a result of the biological activity grow and/ordivide more rapidly than the surrounding cells in the library,facilitating their detection.

In the context of the present invention, the amino acid sequenceidentified using the above method is capable of modulating theexpression of the reporter molecule. Accordingly, the amino acidsequence may be an agonist or an antagonist of the biological activityunder which expression of the reporter molecule is operably placed.Wherein the amino acid sequence is an agonist molecule, reportermolecule expression will be increased or enhanced or activated and,depending upon whether or not the reporter molecule directly orindirectly increases or reduces cell growth and/or viability, cellgrowth will be increased or reduced, respectively. In such embodimentsof the invention however, it is clearly undesirable for the reportermolecule to result in cell death, because it would not be possible torecover the cells expressing the desired peptide. Wherein the amino acidsequence is an antagonist of the biological activity, reporter moleculeexpression will be decreased or repressed or inactivated and, dependingupon whether or not the reporter molecule directly or indirectlyincreases or reduces cell growth and/or viability, cell growth will bereduced or increased, respectively. Wherein the reporter molecule leadsdirectly or indirectly to cell death, antagonism of the biologicalactivity by the antagonist amino acid sequence facilitates survival ofthe cell compared to cells which do not express the antagonist butexpress the reporter molecule.

Examples of suitable yeast positive selectible reporter genes (suitablefor isolation of peptide agonists) include but are not limited to HIS3and LEU2 the protein products of which allow cells expressing thesereporter genes to survive on appropriate cell culture medium.Conversely, several yeast counterselectable reporter genes (suitable forisolation of peptide antagonists) exist, including the URA3 gene,wherein URA3 expression is toxic to a cell expressing this gene, in thepresence of the drug 5-fluoro-orotic acid (5FOA). Othercounter-selectable reporter genes include CYH1 and LYS2, which conferlethality in the presence of the drugs cycloheximide and alphaaminoadipate (αAA), respectively. For counter selection in bacteriacorresponding reporter genes encoding toxic products are available,including: SacB, CcdB and the mammalian GATA-1 gene, the expression ofwhich is toxic in E. coli.

Standard methods are used to introduce the first and second nucleotidesequences into the cellular host. In the case of yeast cells, this maybe achieved by mass mating or transformation.

In one embodiment, the first and second nucleotide sequences are eachcontained within a separate genetic construct, further comprising aselectable marker gene to facilitate detection of transformed cells, forexample an antibiotic resistance selectable marker gene. Preferably, theselectable marker genes for each genetic construct are different, suchthat the presence of one or both genetic constructs in a single cell maybe facilitated. The first and second nucleotide sequences may thus beintroduced into the cellular host by shotgun cotransformation andselection on an appropriate media to select for the presence of bothselectable marker genes.

Alternatively, the first and second nucleic acid sequences may beintroduced by sequential transformation, accompanied by selection forthe appropriate marker genes after each transformation event.

Alternatively, the first and second nucleotide sequences may beintroduced into separate populations of host cells which aresubsequently mated and those cell populations containing both nucleotidesequences are selected on media permitting growth of host cellssuccessfully transformed with both first and second nucleic acidmolecules.

Alternatively, the first and second nucleotide sequences may becontained on a single genetic construct and introduced into the hostcell population in a single step. In such an embodiment of theinvention, the random peptide library is usually produced using a vectorwhich at least comprises the first nucleotide sequence placed operablyunder control of a suitable promoter with or without operator sequence,and a selectable marker gene, the insertion site for the secondnucleotide sequence being selected such that the inserted secondnucleotide sequence is capable of being expressed.

These embodiments are in addition to the steps to be performed inrelation to the introduction of one or more further nucleic acidmolecules that encode one or more polypeptide binding partners of thebiological activity, variations of which are described supra.

The selected host cells can be screened on media comprising thecomponents required to utilise the counter-selectable reporter molecule.Hosts cells expressing a peptide that inhibits the biological activityare unable to adequately transcribe the counter-selectable reporter genethereby permitting the host cell to live in the selection medium. Thosehost cells expressing amino acid sequences that are unable to inhibitthe biological activity transcribe the reporter gene thereby resultingin the formation of a product that is toxic to the host cell in thepresence of the selection medium.

The genetic construct may be in the form of an autonomously replicatingvector or may comprise genetic sequences to facilitate integration intoa host cell genome.

Alternatively, the first nucleotide sequence encoding the reportermolecule can be integrated into the chromosome of the host cell byhomologous recombination of the products of polymerase chain reaction(PCR), or of sequences on another DNA molecule that is incapable ofreplicating autonomously in yeast cells.

According to the nature of the biological activity of interest, thefirst nucleotide sequence may be placed operably in connection with anypromoter sequence, the only requirement being that the promoter iscapable of regulating gene expression in the host cell selected.Usually, the host cell will be varied to suit the promoter sequence. Thepresent invention clearly extends to the isolation of peptides capableof modulating any biological activity.

In fact, the present invention will facilitate the identification andisolation of a amino acid sequences that modulates or mediate expressionof a reporter molecule by agonising or antagonising any regulatory stepwhich is required for expression to occur, not merely steps later in thesignal transduction pathway, such as DNA-protein interactions orinteractions between transcription factors. Wherein it is desired toisolate a specific amino acid sequence which is capable of modulating aparticular biological activity, it is necessary only to operably connectexpression of the first nucleotide sequence to the biological activityof interest. This is done by placing the first nucleotide sequenceoperably in connection with a promoter sequence which is regulated bythe biological activity or alternatively, genetically manipulating apromoter sequence which is operably connected to the first nucleic acidmolecule thereby placing the promoter under operable control of thebiological activity.

In the case of amino acid sequences that modulate or mediate aprotein:DNA interaction which is required for gene expression or themodulation of gene expression, for example to isolate a peptide moleculewhich interacts directly with a cis-acting enhancer or silencer elementor a protein to which said element binds, this objective may be achievedby introducing the cis-acting element into a promoter sequence to whichthe first nucleotide sequence is operably connected. By this means,expression of the reporter molecule is placed operably under the controlof the cis-acting element and modulation of gene expression will occurwhen the appropriate protein molecule either binds to the cis-acting DNAelement or to the protein that recognises said element.

In the case of a protein:protein interaction controlling geneexpression, the promoter controlling the expression of the first nucleicacid molecule is selected such that it contains the necessary cis-actingelements to which at least one of the proteins involved in theinteraction binds. Where there is not complete knowledge of thecis-acting sequences or trans-acting factors involved in regulating geneexpression, but the promoter sequence and cell-type in which expressionoccurs are known, the first nucleotide sequence may be placed operablyin connection with that promoter sequence and the resulting nucleic acidmolecule introduced into that cell type. Such a relationship forms thebasis of “two-hybrid” screening approaches. Wherein the peptide ofinterest antagonises or agonises any step required for expression or theactivation, repression or enhancement of gene expression, the effectwill be identified by recording altered expression of the reportermolecule.

The present invention further contemplates the detection of amino acidsequences that modulate a biological activity, in a mammalian cell,wherein expression of the counter-selectable reporter gene is placedoperably under the control of a mammalian-expressible promoter sequence,which is aberrantly active in the pathogenic situation, for example anoncogene promoter such as MYC. Activity of such a promoter would beblocked directly in cells express an amino acid sequence capable ofinhibiting the oncogene promoter in a mammalian cell.

In a preferred aspect of the sixth embodiment there is provided a methodfor identifying a amino acid sequence which is capable of antagonising aprotein:protein interaction in a host cell said method comprising thesteps of:

-   (i) producing a peptide library in a cellular host wherein the    transformed cells of said library contain at least a first    nucleotide sequence which comprises or encodes a reporter molecule    capable of reducing the growth and/or viability of said host cell,    the expression of which is operably under control of said    protein:protein interaction and a second nucleotide sequence derived    from a defined genomic sequence which is capable of encoding said    amino acid sequence when placed operably under the control of a    suitable promoter sequence and wherein (b) substantially all of the    defined genomic sequence is present within the population of    transformed cells making up said library;-   (ii) culturing said cellular host for a time and under conditions    sufficient for expression of said second nucleotide sequence to    occur; and-   (iii) selecting cells wherein expression of said reporter molecule    is antagonised, repressed or reduced.

Preferably, the subject method includes the additional first step orlater step of introducing into the cellular host one or more furthernucleic acid molecules which encode one or more polypeptide bindingpartners which are involved in the biological activity, operably underthe control of one or more promoter sequences. Such embodiments aredescribed in detail supra.

According to this embodiments, it is preferred that the reportermolecule comprise a peptide, polypeptide, enzyme, or other proteinmolecule which is capable of converting an innocuous substrate moleculeinto a cytostatic compound, anti-mitotic compound or a toxin, such thatantagonised expression of the reporter molecule by the subject peptideprevents cell death or at least prevents a reduction in cell growthand/or viability in the presence of the substrate. More preferably, inthe yeast system, the reporter gene is URA3 and/or CYH2, amongst othersuch as LYS2. In a particularly preferred embodiment, the reportermolecule is the product of the URA3 gene which, when expressed converts5-fluoroorotic acid (5-FOA) to a toxic product.

One exemplification of this embodiment takes advantage of the fact thatmost active eukaryotic transcription activators are modular and comprisea DNA binding domain and a DNA activation domain, wherein the DNAbinding domain and the DNA activation domain may be contained on thesame protein molecule or alternatively, on separate molecules whichinteract to regulate gene expression. According to this embodiment, theexpression of the reporter molecule is placed operably under the controlof a protein:protein interaction, for example between the oncogenicproteins SCL and LMO2 which bind to form an active artificialtranscription factor. The transcription of the reporter gene cantherefore be used as an indicator of two proteins interacting where oneof said proteins of interest comprises at least a DNA binding domain andbinds to an operator promoter element upstream of the reporter gene andsaid other protein of interest comprises at least a DNA activationdomain. Binding of the DNA binding protein to the operator, in thepresence of a function activation domain, initiates transcription of thereporter gene. The URA3 reporter thereby acts as a counter selectablemarker.

This embodiment of the invention may be adapted to the identification ofamino acid sequences which modulate other protein:protein interactions,by functionally replacing the DNA binding domain of a transcriptionfactor with a different DNA binding domain which is specific for adifferent cis-acting element in the promoter regulating expression ofthe reporter molecule. Methods for the productions of such fusionproteins are well known to those skilled in the art. In such cases, theselection of an appropriate DNA binding domain will depend on the natureof the DNA binding site located upstream of the reporter gene.

For example, fusion proteins may be constructed between an oncoproteinand a DNA binding domain and/or a DNA activation domain. For example, asequence of nucleotides encoding or complementary to a sequence ofnucleotides encoding residues 176 to 331 of SCL may be fused to the LexADNA binding domain and a nucleotide sequence encoding LMO2 may be fusedto a DNA activation domain (or vice-versa).

The present invention is also particularly useful for identifying aminoacid sequences that inhibit protein:protein interactions which normallyproduce deleterious effects (apart from the deleterious effect ofcertain reporter molecules), for example interactions involving oncogeneproducts. Specific examples of oncogenes, the products of which formtranscription factors contributing to tumorigenesis, include SCL and anyone or more of DRG, E47 and/or LMO2.

In a further aspect of the sixth embodiment there is provided a methodfor identifying a amino acid sequence that is capable of modifying aprotein:protein interaction in a host cell, said method comprising thesteps of:

-   (i) producing a peptide library in a host wherein (a) the    transformed cells of said library contain: (1) at least a first    nucleotide sequence which comprises or encodes a reporter molecule    wherein said nucleotide sequence is operably connected to an    operator sequence or transcription factor binding site; (2) a second    nucleotide sequence derived from a defined genomic sequence which    encodes said amino acid sequence when placed operably under the    control of a suitable promoter sequence; and (3) one or more further    third nucleotide sequences which encode one or more polypeptides,    proteins or fusion proteins wherein at least one of said    polypeptides, proteins or fusion proteins includes at least one DNA    binding domain capable of binding to said operator sequence or    transcription factor binding site and at least one of said    polypeptides, proteins or fusion proteins includes at least one DNA    activation domain or derivative thereof capable of activating the    expression of said first nucleotide sequence when targeted to the    promoter/operator by interaction with another protein bearing the    cognate DNA binding domain; and (b) substantially all of the defined    genomic sequence is present within the population of transformed    cells making up said library;-   (ii) culturing said host cell for a time and under conditions    sufficient to permit expression of said second and further    nucleotide sequences to occur; and-   (iii) selecting cells wherein expression of said reporter molecule    is activated, inhibited or otherwise modified.

The proteins involved in the biological activity of interest, which areencoded by the second nucleic acid molecule, are synthesised in the hostcell, either encoded by one or more foreign nucleotide sequencestransformed into the host cell or integrated into the genome of saidcell. However, the present invention clearly extends to situations inwhich these sequences are also encoded by endogenous host cell genes.

According to this embodiment, the DNA binding domain binds to theoperator sequence and, in the presence of the DNA activating region,expression of the reporter molecule occurs. Wherein the secondnucleotide sequence encodes a peptide that antagonises or inhibits DNAbinding and/or DNA activation, expression of the reporter molecule isrepressed, reduced or otherwise inhibited. Alternatively, wherein thesecond nucleotide sequence encodes an amino acid sequence that agonisesor enhances DNA binding and/or DNA activation, expression of thereporter molecule is activated, enhanced or otherwise increased.

Those skilled in the art will recognise that the DNA binding domain andthe DNA activation domain may be contained on a single amino acidmolecule or alternatively, they may be contained in separate amino acidmolecules that interact with each other to regulate reporter geneexpression.

Similarly, the first and/or second and/or further nucleotide sequencesmay be contained in a single nucleic acid molecule, for example in onegenetic construct or alternatively, one, two, three or more of saidsequences may be contained on separate nucleic acid molecules. Whereinone or more of the nucleotide sequences are contained on separatenucleic acid molecules, then each such nucleotide sequence is furtherpreferably operably connected to its own promoter sequence.Alternatively, where any two or more of the nucleotide sequences arecontained on the same nucleic acid molecules, the nucleotide sequencesmay be expressed under the control of a single promoter oralternatively, under the control of separate promoter sequences.

Those skilled in the art will recognise that the alternatives describedsupra are equally applicable to this embodiment of the invention.

In a further preferred aspect of the sixth embodiment, the subjectmethod further comprises the step of isolating the second nucleotidesequence from the host cell and sequencing the nucleic acid molecule andderiving the amino acid sequence encoded therefor. Once the sequence hasbeen identified it can then be compared to like sequences in within theknown nucleotide sequence to identify those amino acids which areessential for modulation of biological activity. Synthetic peptides maythen be produced, based upon the derived amino acid sequence thusobtained. Those skilled in the art are well versed in such techniques.

The present invention also contemplates amino acid sequences identifiedby the method of the present invention.

Preferably the amino acid sequences are agonists or antagonists ofprotein:protein or protein:DNA interactions. More preferably, thepeptides, oligopeptides and polypeptides of the present invention areantagonists of protein:protein interactions or protein:DNA interactionsand even more preferably, antagonists of protein:protein interactions.

In a particularly preferred embodiment, the peptides of the inventionantagonise or inhibit interactions that produce deleterious effects ineukaryotic cells, in particular human or animal cells. More preferably,the amino acid sequences of the invention antagonise or inhibitinteractions which involve one or more oncoproteins.

The present invention clearly contemplates the use of said amino acidsequences or fragments or derivatives thereof in the prophylactic ortherapeutic treatment of human or animal. Methods of treatment includetheir use in antibiotic peptide therapy regimens such as in thetreatment protocols for said patients with bacterial, fungal or viralinfections. Their use in treatment protocols for said patients includestheir administration as a means of inhibiting the growth of theinfecting microorganism and/or inhibiting its virulence. The use of suchpeptides in potentiating the effects of other antimicrobial agents isalso envisaged (eg. See international PCT application: WO 96/24684).

Accordingly, another aspect of the present invention contemplates apharmaceutical composition comprising a peptide, oligopeptide andpolypeptide that is capable of modulating a biological activity and oneor more pharmaceutically acceptable carriers and/or diluents.

A preferred embodiment contemplates a pharmaceutical composition whereinsaid peptide, oligopeptide and polypeptide antagonises the growth and/orvirulence of a pathogen, and one or more pharmaceutically acceptablecarriers and/or diluents. These components are referred to as the activeingredients.

The pharmaceutical forms suitable for injectable use include sterileaqueous solutions (where water-soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersion or may be in the form of a cream or other formsuitable for topical application. Alternatively, injectable solutionsmay be delivered encapsulated in liposomes to assist their transportacross cell membrane. Alternatively or in addition such preparations maycontain constituents of self-assembling pore structures to facilitatetransport across the cellular membrane. It must be stable under theconditions of manufacture and storage and must be preserved against thecontaminating/destructive action of environmental microorganisms such asbacteria and fungi. The carrier can be a solvent or dispersion mediumcontaining, for example, water, ethanol, polyol (for example, glycerol,propylene glycol and liquid polyethylene glycol, and the like), suitablemixtures thereof, and vegetable oils. The proper fluidity can bemaintained, for example, by the use of a coating such as licithin, bythe maintenance of the required particle size in the case of dispersionand by the use of superfactants. Prevention of the action ofmicroorganisms can be brought about by various antibacterial andantifungal agents, for example, parabens, chlorobutanol, phenol, sorbicacid, thimerosal and the like. In many cases, it will be preferable toinclude isotonic agents, for example, sugars or sodium chloride.Prolonged absorption of the injectable compositions can be brought aboutby the use in the compositions of agents delaying absorption, forexample, aluminum monostearate and gelatin.

Sterile injectable solutions are prepared by incorporating the activecompounds in the required amount in the appropriate solvent with variousof the other ingredients enumerated above, as required, followed byfiltered sterilization. Generally, dispersions are prepared byincorporating the various sterilized active ingredient into a sterilevehicle which contains the basic dispersion medium and the requiredother ingredients from those enumerated above. In the case of sterilepowders for the preparation of sterile injectable solutions, thepreferred methods of preparation are vacuum drying and the freeze-dryingtechnique which yield a powder of the active ingredient plus anyadditional desired ingredient from previously sterile-filtered solutionthereof.

When the active ingredients are suitably protected they may be orallyadministered, for example, with an inert diluent or with an assimilableedible carrier, or it may be enclosed in hard or soft shell gelatincapsule, or it may be compressed into tablets, or it may be incorporateddirectly with the food of the diet. For oral therapeutic administration,the active compound may be incorporated with excipients and used in theform of ingestible tablets, buccal tablets, troches, capsules, elixirs,suspensions, syrups, wafers, and the like. Such compositions andpreparations should contain at least 1% by weight of active compound.The percentage of the compositions and preparations may, of course, bevaried and may conveniently be between about 5 to about 80% of theweight of the unit. The amount of active compound in suchtherapeutically useful compositions in such that a suitable dosage willbe obtained. Preferred compositions or preparations according to thepresent invention are prepared so that a dosage unit form containsbetween about 0.1 μg and 20 g of active compound.

The tablets, troches, pills, capsules and the like may also contain thecomponents as listed hereafter: A binder such as gum, acacia, cornstarch or gelatin; excipients such as dicalcium phosphate; adisintegrating agent such as corn starch, potato starch, alginic acidand the like; a lubricant such as magnesium stearate; and a sweeteningagent such as sucrose, lactose or saccharin may be added or a flavouringagent such as peppermint, oil of wintergreen, or cherry flavouring. Whenthe dosage unit form is a capsule, it may contain, in addition tomaterials of the above type, a liquid carrier. Various other materialsmay be present as coatings or to otherwise modify the physical form ofthe dosage unit. For instance, tablets, pills, or capsules may be coatedwith shellac, sugar or both. A syrup or elixir may contain the activecompound, sucrose as a sweetening agent, methyl and propylparabens aspreservatives, a dye and flavouring such as cherry or orange flavour. Ofcourse, any material used in preparing any dosage unit form should bepharmaceutically pure and substantially non-toxic in the amountsemployed. In addition, the active compound(s) may be incorporated intosustained-release preparations and formulations.

The present invention also extends to forms suitable for topicalapplication such as creams, lotions and gels.

Pharmaceutically acceptable carriers and/or diluents include any and allsolvents, dispersion media, coatings, antibacterial and antifungalagents, isotonic and absorption delaying agents and the like. The use ofsuch media and agents for pharmaceutical active substances is well knownin the art. Except insofar as any conventional media or agent isincompatible with the active ingredient, use thereof in the therapeuticcompositions is contemplated. Supplementary active ingredients can alsobe incorporated into the compositions.

It is especially advantageous to formulate parenteral compositions indosage unit form for ease of administration and uniformity of dosage.Dosage unit form as used herein refers to physically discrete unitssuited as unitary dosages for the mammalian subjects to be treated; eachunit containing a predetermined quantity of active material calculatedto produce the desired therapeutic effect in association with therequired pharmaceutical carrier. The specification for the novel dosageunit forms of the invention are dictated by and directly dependent on(a) the unique characteristics of the active material and the particulartherapeutic effect to be achieved, and (b) the limitations inherent inthe art of compounding such an active material for the treatment ofdisease in living subjects having a diseased condition in which bodilyhealth is impaired as herein disclosed in detail.

The principal active ingredient is compounded for convenient andeffective administration in effective amounts with a suitablepharmaceutically acceptable carrier in dosage unit form. A unit dosageform can, for example, contain the principal active compound in amountsranging from 0.5 μg to about 2000 mg. Expressed in proportions, theactive compound is generally present in from about 0.5 μg to about 2000mg/ml of carrier. In the case of compositions containing supplementaryactive ingredients, the dosages are determined by reference to the usualdose and manner of administration of the said ingredients.

The pharmaceutical composition may also comprise genetic molecules suchas a vector capable of transfecting target cells where the vectorcarries a nucleic acid molecule capable of inhibiting such deleteriousbiological interaction/activities. The vector may, for example, be aviral vector.

EXAMPLES

Further features of the present invention are more fully described inthe following non-limiting Examples. It is to be understood, however,that this detailed description is included solely for the purposes ofexemplifying the present invention. It should not be understood in anyway as a restriction on the broad description of the invention as setout above.

Methods of molecular cloning, immunology and protein chemistry that arenot explicitly described in the following examples are reported in theliterature and are known by those skilled in the art. General texts thatdescribed conventional molecular biology, microbiology, and recombinantDNA techniques within the skill of the art, included, for example:Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989);Glover ed., DNA Cloning: A Practical Approach, Volumes I and II, MRLPress, Ltd., Oxford, U.K. (1985); and Ausubel, F., Brent, R., Kingston,R. E., Moore, D. D., Seidman, J. G., Smith, J. A., Struhl, K. Currentprotocols in molecular biology, Greene Publishing Associates/WileyIntersciences, New York.

Example 1 The Construction of Biodiverse Gene Fragment Libraries

The genomic DNA of a diverse panel of microorganisms, chosen to maximisethe genetic diversity across the panel, were reduced to fragmentssuitable for expressing peptides. Techniques suitable for achieving thisoutcome include: mechanical shearing, partial DNA-ase1 digestion and theuse of combinations of restriction endonuclease.

Each genome was then added to the pool in direct proportion to its sizeand complexity. More DNA of large genomes was added than small genomesto ensure adequate representation.

A peptide library was then constructed by digesting aliquots of thepooled DNA with all 6 combinations of 2 restriction enzymes from a setcontaining Alu I, Bst U I, Hae III and Rsa I. These enzymes areblunt-cutting restriction endonucleases, which have distinct 4 base pairrecognition sequences and thus each combination will produce fragmentswith sizes in the 90-120 by range predominating. These are suitable forcloning and the length of DNA is sufficient to encode peptides of about30 amino acid residues that are in the range of the sizes of sequencesreported in structurally conserved regions of proteins. In instanceswhere linkers rather than adaptors are to be ligated to the genomicfragments, the genomic digest pool may be protected from subsequentdigestion by treatment with an appropriate methylase (in this exampleEcoR1 methylase).

The digest fragments were purified by native acrylamide gelelectrophoresis followed by gel filtration chromatography.

Linkers were then ligated onto the DNA fragments by standard methods. 3reading frames of linkers were used. Where the fragments are to becloned into an EcoR1 site, equimolar amounts of the following 3self-annealing linkers may be used:

d(pGGAATTCC), d(pCGGAATTCCG) and d(pCCGGAATTCCGG)

Where the cloning was intended to be directional, an equimolar amount ofanother linker corresponding to the second 3′ restriction site wasadded—eg for cloning into EcoR1 and HindIII sites of a vector (eg., anequal number of moles (to be combined EcoR1 linkers) of the followinglinker was added to the ligation: d(pCCAAGCTTGG).

Linkers were then digested with the restriction endonuclease/scorresponding to their recognition sequences and appropriately sizeddigest fragments were purified by standard techniques including; agarosegel electrophoresis, sucrose or potassium acetate gradients, or sizeexclusion chromatography.

The genomic fragments which contain flanking linkers or adaptors (seeexample 4 below) were then cloned into a pT7-Select expression vector bystandard methodology for library construction.

Example 2 BGF Library Construction

Biodiverse gene fragment libraries can be constructed using adaptedfragments of pooled genomic DNA from an evolutionarily diverse set ofcompact genomes. To maximise the diversity of the pool, the relativeconcentration of DNA in the pool from larger genomes can be increased inproportion to the total haploid genome size. The genomic inserts can befragmented using mechanical shearing (e.g. sonication) followed byrepair and ligation of linker oligonucleotides or adaptors.Alternatively, they can be made by polymerase extension of partiallydegenerate oligonucleotides annealed to the denatured genomic DNA,followed by amplification using the polymerase chain reaction (PCR).

In this example the oligonucleotides used in the primary extension withthe Klenow fragment of DNA polymerase-I (at 15-25 degrees celcius), hadthe sequence:

(Using * to represent a universal base such as 5-nitro-indole) Forwardprimer: GACTACAAGGACGACGACGACAAGNNNNNNNN* Reverse primer:ATTCCCGGGAAGCTTATCAATCAATCANNNNNNNN*

N corresponded to degenerate nucleotides (e.g. either dATP, dCTP, dGTPor dTTP). Moreover, either of the universal bases: deoxyinosine, or5-Nitroindole (or functionally equivalent analogue) can be substitutedat any or all of the ‘N’ positions of the primer, especially at the 3′terminal position. Thus the length of the ‘N’ series can varied from 6to 8 nucleotides.

According to this example, the primers for the nested PCR amplificationof the product of the Klenow extension reaction were:

Forward primer: GAGAGGAATTCAGACTACAAGGACGACGACGACAAG Reverse primer:GAGAGAATTCCCGGGAAGCTTATCAATCAATCA

The PCR amplification was performed using a ‘Touchdown’ protocol with‘hot-start’ enzyme to maximise specificity.

The initial extension and PCR amplification was performed entirely withKlenow polymerase adding more polymerase each cycle as in the initialreport of PCR. This allows the entire cycling to be performed betweenthe denaturation temperature (90-100 degrees celcius) and a low,annealing temperature (15-25 degrees celcius), minimising the potentialannealing bias against amplification of A/T rich sequences. For thisapproach the primers had the form:

Forward primer: GAGAATTCANNNNNNNN* Reverse primer: GAGAATTCNNNNNNNN*

Methylated nucleotides can be included in the PCR reaction (but notincorporated into the primers) to protect the products from internalcleavage with restriction enzymes during cloning.

In a preferred form of the example, mutagenic PCR using alternativenucleotides and/or the use of a manganese buffer can also be employed toincrease the sequence diversity of inserts.

The resultant PCR products were digested with EcoR1 alone or EcoR1 andXma1 (where the reverse primer contains an Xma1 site prior to cloninginto vectors of the pBLOCK series.

Libraries were constructed according to standard methodology using thehighest efficiency commercially available competent cells viz. XL10-Gold(Stratagene) to ensure complexities greater than 10⁷ independent clones.

Example 3 Mimotope Libraries Using Biodiverse Gene Fragments

This example illustrates the detection of mimotopes from the major housedust mite allergen Der p 1.

DNA in the 90-120 by range of each of the double digests was isolatedand pooled, ligated to linkers in all reading frames and cloned intophage display vector T7 Section 1.1 or the vector T7 Section 415. SomeDNA fragments were outside the range of 90-120 by range and were notcloned, but the redundancy in the digestion procedures should allow arepresentation of most sequences. The use of a pool of 3 reading framesof linkers and/or a translational slippage signal in the construction ofthe library ensured that all 6 reading frames of the inserts wererepresented. The total genome size of a biodiverse panel ofmicroorganisms was approximately 35 Mb. This procedure generated about12×10⁶ different fragments allowing for cloning in all reading framesand orientations. Allowing for the latter about ⅙^(th) of the sequencesencoded natural peptides. The T7Select is a molecular cloning systemwith high packaging efficiency and is designed to display the peptidesencoded by the cloned DNA as C-terminal fusions on a phage coat proteinwhich is accessible for affinity purification procedures. A minority ofthe unnatural peptides were smaller than the estimated size rangebecause they will be truncated by stop codons. The T7 Select 1.1 or thevector T7 Select 415 display the peptide in low and high copy number sohigh and low affinity interactions can be used for affinitypurification.

Further diversity was generated by PCR mutagenesis which conducts theamplification under conditions which favour high error rates. It hasbeen calculated with an error rate of 0.5 bases per 100 bp/cycle (whichcan be achieved) that eight mutagenic cycles produces base changes in90% of the PCR products and almost 50% will have 2 or 3 substitutions.Linkers were added to provide the primer sequences for the PCR and afinal high fidelity PCR was performed with linkers extended to providecloning sites. The mutated fragments had a 10× diversification of thesequences in an amount of DNA which was readily packaged.

Phage from the libraries constructed above which display peptides whichbind to human and murine IgG and IgE anti-Der p 1 were isolated usingmethods based on those described for pollen allergens [11] and otherantigens. They are essentially standard protocols for affinity purifyingphage displaying antigens. Such methods have been described forfilamentous phage display systems and the T7Select cloning system.

Antibody was affinity purified from Der p 1-coupled Sepharose™ and usedto coat ELISA wells to immunoselect phage by a panning procedure.Several cycles of selection and phage amplification were performed asrecommended. Several types of affinity purification methods have beenused for selecting phage so there is scope to use a variety ofprocedures. Human IgE antibodies were isolated from the serum ofallergic subjects and IgG from the serum of allergic and nonallergicsubjects. Monoclonal mouse IgG antibodies which are known to recognizesa different epitope were used to isolate peptides which mimic differentepitopes.

Following selection and amplification of the phage displaying thepeptides further purification may be obtained using plaque immunoassaysperformed with anti Der p 1 antibodies [11; 12]. Such a procedureenables the isolation of individual clones reacting with the antibodies.Crossover immunoassays were performed with different human and mouseantibody preparations to estimate the frequency of, and to isolateshared peptide mimotopes. Phage were then selected for further studybased on the sequence of the peptide, the serological reactivity andintended use. The specificity of the antibody mimotope interaction wastested by inhibition assays against other purified mite allergens and byWestern blotting of antiserum against complex protein sources, allergenand microbial extracts.

The antibody binding activity of mimotopes can often be improved by fineadjustments of the amino acid sequence. Clones encoding peptidesreacting with anti Der p 1 were optimized for antibody binding by randommutagenesis using PCR enhanced for mismatching by Mn⁺⁺ and highnucleotide concentrations. The sequences flanking cloning site will beused for the primers. A final high fidelity PCR using the primersextended to contain the restriction enzyme sequences for recloning intodisplay vector was performed for cloning. The phage containing themutated inserts were then used to transform E. coli and produce plaquesfor immunoscreening. Clones showing the highest antibody bindingactivity were picked.

The peptides identified by the described purification procedure weretested for their ability to not only mimic the an epitope of the Der p 1allergen but to be a mimotope which can immunise animals or humans toinduce anti Der p 1 antibodies. This was performed in the followingways: with a synthetic peptide chemically coupled to an immunogeniccarrier, with peptide genetically fused to a carrier by molecularcloning techniques and by using the phage displaying the peptide asimmunogens.

The ability of the peptides to bind to IgE against the Der p 1 allergencan be used for diagnostic techniques which not only detect the presenceof antibody but which can also show the diversity of the immune responseand pattern of epitope recognition. The ability to act as a mimotope andinduce anti-allergen IgG antibody can be used in severalimmunotherapeutic strategies. Importantly constructs can be produced toenable the peptide to be used as a monovalent immunogen and thus preventallergens reaction cause by cross-linking IgE molecules in allergicpatients.

Example 4 Screening Peptide Libraries Encoding Biodiverse Gene Fragmentsfor Antimicrobial Agents

To isolate novel peptides with antibiotic activity against amulti-resistant Staphylococcus aureus strain, the following approach wasused.

A biodiverse gene fragment library was first made by the proceduresdescribed in example 1 in a T7-phage vector. Examples of T7-phagevectors that can be used in this part of the method include: T7Select415-1b, T7Select1-1b, T7Select1-2a, T7Select1-2b and/or, T7Select1-2c(Novagen), having a complexity greater 1,000,000 individual clones.

The library was plated out at a multiplicity of at least one on a lawnof either E. coli BL21 (in the case of T7Select415-1b) or either of thecomplementing hosts E. coli BLT5403 or E. coli BLT5615 (for the othervectors), to allow a plaque density of below semi-confluence.

The plates used were double-sided, being made in a fashion resemblingdual culture plates joined together by the underside. Such platestherefore had two lids on opposite faces. The adjoining face of the twosides of the plates was made of nitrocellulose or nylon membrane,supported by a grid made of a rigid material such as plastic. Theopposite side of the plate to the side containing the BL21-derived T7plaque overlay contained media suitable for the growth of Staphylococcusaureus. Following the plating of the library, the Staphylococcus aureuswas on the face of the plate containing the appropriate media at theminimum density required to obtain a lawn.

The plates were then incubated at 37 degrees Celsius until the T7 phageplaques appear and the Staphylococcus aureus lawn appears. Anydiscontinuities in the lawn of Staphylococcus aureus can reflect thediffusion of an inhibitory drug produced by a phage plaque at acorresponding position on the opposite side of the plate. The plaqueswere then purified to clonality and tested for inhibitory properties insubsequent secondary, tertiary and/or quarternary screens.

The inserts from pure plaques were then amplified using PCR andsequenced using vector primers. The inserts of the clones were thensubcloned and purified by standard bacterial expression methodologyusing vectors such as PET14b, pMAL-c2 or pTYB1, and tested for minimalinhibitory concentration (MIC) by methods known to those skilled in theart.

The sequence of inhibitory peptides can then be used to design syntheticpeptide-based candidate drugs which would be tested for animicrobialactivity against Staphylococcus aureus.

Example 5 Selecting Blockers of Protein/Protein Interactions fromBiodiverse Gene Fragment Libraries in Yeast

Reverse two hybrid libraries were constructed and screened using thevector pBLOCK-1 as described in our earlier specification (seePCT/AU99/00018) using genomic inserts prepared as described supra inexample 1, with the addition of EcoR1 linkers and cloned into the EcoR1site of the vector.

Obvious variations of this method will be known to those skilled in theart such as the possibility of using adaptors instead of linkers, ofusing alternative cloning sites and of including addition sequences intothe linkers. For example a pool of the following annealed adaptors couldbe used in place of the linkers: (Each strand of the adaptor sequence isshown reading 5′ to 3′).

Adaptor 1 AATTCAATCAATCACACACAGGAGGCCACCATGGATGCATGTGTGCACGTGCACACATGCATCCATGGTGGCCTCCTGTGTGTGATTGATTG Adaptor 2AATTCAATCAATCACACACAGGAGGCCACCATGGATGCATGTGTGCATGCACACATGCATCCATGGTGGCCTCCTGTGTGTGATTGATTG Adaptor 3AATTCAATCAATCACACACAGGAGGCCACCATGGATGCATGTGTGCGCACACATGCATCCATGGTGGCCTCCTGTGTGTGATTGATTG

Adaptors such as those shown here can encode motifs useful forexpression or conformational constraint (eg. in this case; dualShine-Dalgarno and Kozak sequences, flanking cysteine residues and stopcodons).

The library was transformed or mated into a yeast strain containing thetwo proteins whose interaction which one intends to block and containingcounter selectable reporter genes whose expression is dependent on thatinteraction. Detailed methodology for reverse two hybrid screening isdescribed in our specification PCT/AU99/00018.

REFERENCES

-   1. Tiozzo, E., Rocco, G., Tossi, A. & Romeo, D. (1998). Biochemical    and Biophysical Research Communications, 249, 202-206.-   2. Balaban, N., Goldkorn, T., Nhan, R., Dang, L., Scott, R M, R.,    Rasooly, A., Wright, S., Larrick, J., Rasooly, R. & Carlson, J.    (1998). Science, 280, 438-440.-   3. Colas, P., Cohen, B., Jessen, T., Grishina, I., McCoy, J.,    Brent, R. (1996). Nature, 380, 548-550.-   4. Xu, C., Mendelsohn, A. & Brent, R. (1997). Proc. Natl. Acad. Sci.    USA, 94, 12473-12478.-   5. Kolonin, M. & Finley, R. (1998). Proc. Natl. Acad. Sci. USA, 95,    14266-14271.-   6. Derossi, D., Joliot, A. H., Chassaing, G., Prochiantz, A. (1994).    Journal of Biological Chemistry, 269, 10444-10450.-   7. Phelan, A. (1998). Nature Biotechnology, 16, 440-443.-   8. Marcello, A., Loregion, A., Cross, A., Marsden, H., Hirst, T.,    Palu, G. (1994). Proc. Natl. Acad. Sci. USA, 91, 8994-8998.-   9. Fahraeus, E., Paramio, J. M., Ball, K. L., Lain, S., Lane, D. P.    (1996). Current Biology, 6, 84-91.-   10. Mennuni, C., Santini, C., Lazzaro, D., Dott, F., Farilla, L.,    Fierabracci, A., Bottazzo, G. F., Di Mario, U., Cortese, R. &    Luzzago, A. (1997). Journal of Molecular Biology, 268, 599-606.-   11. Leitner, A., Vogel, M., Radauer, C., Breiteneder, H.,    Stadler, B. M., Scheiner, 0., Kraft, D. & Jensen-Jarolim, E. (1998).    European Journal of Immunology, 28, 2921-7.-   12. Pincus, S. H., Smith, M. J., Jennings, H. J., Burritt, J. B. &    Glee, P. M. (1998). Journal of Immunology, 160, 293-8.

1-23. (canceled)
 24. A gene fragment expression library comprising aplurality of different nucleotide sequences from a plurality ofbiodiverse organisms each having a sequenced genome and wherein theorganisms are microorganisms and/or eukaryotes having compact genomes.25. The library according to claim 24, wherein the sequence fragments ofthe library are from organisms selected from the group consisting ofFugu rubripes, Caenorhabditis elegans, Saccharomyces cerevisiae, E.coli, Aquifex aelitcus, Methanococcus jannaschii, Bacillus subtilis,Haemophilus influenzae, Helicobacter pylori, Neisseria meningiditis,Synechocystis sp. Bordetella pertussis, Pasteurella multocida,Pseudomonas aeruginosa, Borrelia burgdorferi, Methanobacteriumthermoautotrophicum, Mycoplasma pneumoniae, Archaeoglobus fulgidis andVibrio harveyi.
 26. The gene fragment expression library according toclaim 24 constructed using adapted fragments of pooled genomic DNA froman evolutionarily diverse panel of compact genomes.
 27. The expressionlibrary according to claim 24 comprising fragments of DNA from a diversepanel of microorganisms.
 28. The expression library according to claim24 comprising adapted fragments of pooled genomic DNA from anevolutionarily diverse set of compact genomes.
 29. The expressionlibrary according to claim 25 wherein the nucleic acid fragments of thelibrary comprise 90 base pairs to 120 base pairs in length.
 30. Theexpression library according to claim 25 wherein the nucleic acidfragments of the library are of sufficient length to encode peptidescomprising about 30 amino acids in length.
 31. A method of producing agene fragment expression library comprising a plurality of differentnucleotide sequences from different organisms said method comprisingproducing nucleotide sequence fragments from a nucleotide sequence ofknown nucleotide composition wherein said nucleotide sequence is from asequenced genome of a microorganism and/or a sequenced compact genome ofan eukaryotic species.
 32. The method according to claim 31 comprising:(i) producing defined fragments of DNA from a diverse panel ofmicroorganisms; (ii) pooling the fragments in direct proportion to thesize and complexity of the each genome; and (iii) inserting the combinedfragments into an expression vector.
 33. The method of claim 32 whereinthe nucleic acid fragments are 90 base pairs to 120 base pairs inlength.
 34. The method of claim 32 comprising digesting the pooledfragments with one or more restriction endonucleases to produce nucleicacid fragments of 90 base pairs to 120 base pairs in length.
 35. Themethod according to claim 33 or 34 wherein the nucleic acid fragmentsare of sufficient length to encode peptides comprising about 30 aminoacids in length.
 36. The method according to claim 32 comprising: (i)producing adapted fragments of pooled genomic DNA from an evolutionarilydiverse set of compact genomes wherein the relative concentration of DNAin the pool from larger genomes is increased in proportion to the totalhaploid genome size; and (ii) inserting the combined fragments into anexpression vector.
 37. The method according to claim 36 comprisingfragmenting the pooled genomic DNA by mechanical shearing.
 38. Themethod of claim 36 comprising producing the pooled genomic DNA bypolymerase extension of partially degenerate oligonucleotides annealedto denatured genomic DNA and amplification using polymerase chainreaction (PCR).
 39. The method of claim 38 wherein PCR is mutagenic PCR.